Robbie Blom

Robbie Blom

October 1, 2024

6 minute read

Pivot Tables? You Might Be Overlooking Insight

Pivot tables and crosstabs are one of the easiest and most informative ways to summarize data. However as quick and easy as they are, most people miss a powerful way of discovering which categories of the table are associated with each other.

Correspondence analysis visualizes these associations really nicely, and you can use it in diverse applications ranging from analyzing your brand and customers to breaking down political polls or experimenting with UX design.

How To Tease Out Relationships Among Categories

When you look at a pivot table, it's actually easier than you might think to draw misguided conclusions about the relationships among categories. We'll see this in an example and then see how correspondence analysis can help.

The Obvious Way: Looking At The Numbers

Consider an example of a beach shop that sells T-shirts, swimming suits, and plastic sunglasses. The shop has three types of customers segmented by age, and their sales for the month are shown below.

Beach Shop Sales By Product and Customer Age ($K)

T ShirtsSwimming SuitsPlastic Sunglasses
0-244.23.64.8
25-399.015.03.0
40+21.015.03.0

Based on the month's sales you might conclude:

  • T-shirts are most popular among 40+ year-olds: Because $21.0K is so much higher than $4.2K and $9.0K.
  • 25-39 year-olds and 40+ year-olds buy similar amounts of swimming suits and plastic sunglasses: Because they both buy $15.0K and $3.0K of each, respectively.
  • 0-24 year-olds buy moderately more plastic sunglasses: Because $4.8K is 60% higher than $3.0K.

These conclusions are in the right direction, but if you're experienced with these types of tables then you'll know that we're forgetting something: On the whole, 40+ year-olds buy more from the shop than 25-39 year-olds ($39K vs $27K) and 25-39 year-olds buy more than 0-24 year-olds ($27K vs $12.6K).

The table below shows this detail by summarizing the data across rows and columns.

T ShirtsSwimming SuitsPlastic SunglassesTotal%
0-244.23.64.8
25-399.015.03.0
40+21.015.03.0

What this means is that in our initial conclusions, we were comparing apples and oranges.

For example, although 0-24 year-olds buy 60% more plastic sunglasses, they also comprise the smallest segment of the customer base. That means the youngest customers are actually crazy about plastic sunglasses - way more than 60% would suggest!

What about T-shirts? 40+ year-olds clearly buy the most, but they're also the largest customer segment. Given that, on the whole, 40+ year-olds buy 15.2% more than 25-39 year-olds, it seems like the oldest segment genuinely buys more T-shirts. But what about the youngest segment? Is $4.2K more than we'd expect? Less?

How do we quantify if there's actually a difference in sales among age groups or if it's just the result of different-sized customer segments? What about quantifying differences among products within the same age group?

We can get close to these answers doing ad-hoc mental math, or we can use a tool designed for the job: correspondence analysis.

Correspondence Analysis: The Tool You Need

Correspondence analysis basically generates a chart that looks like the one below. If you've ever seen the Gartner Magic Quadrant, then you will have seen something similar.

Correspondence Analysis Chart

The key idea is that points that are close together are associated with each other, and this proximity takes different-sized customer segments into account. The center of the chart represents the scenario where each category is bought in proportion to the size of the customer segment.

The axes mathematically separate these points to highlight the differences among them. In this case, the horizontal axis separates the points by clothing vs other apparel, and the vertical axis separates T-shirts from swimming suits. Alternatively, we can interpret these axes as separating based on age group.

With this chart, we can revise our conclusions from before. We're able to confirm our first two conclusions, add another one, and fix the last one:

  • 40+ year-olds do indeed buy a disproportionate number of T-shirts: This is true because the points are close together and further from the center of the chart.
  • 25-39 year-olds and 40+ year-olds do indeed buy similar amounts of swimming suits and plastic sunglasses: This is true because the respective points are roughly the same distance from the sunglasses cluster along the x-axis and roughly equidistant from each other along the y-axis.
  • 25-39 year-olds buy a disproportionate number of swimming suits: This is true for the same reason as the first bullet.
  • 0-24 year-olds buy a largely disproportionate number of plastic sunglasses - not just moderately more: This is true because the points for 0-24 and plastic sunglasses are very far from the center of the chart.

We also learned a good deal about our customers: turns out that we can essentially summarize the whole table by saying that each age group mostly buys one thing, but there's still some cross-selling.

Other Applications of Correspondence Analysis

Correspondence analysis can be used anytime you have a pivot table or crosstab, which means that it can be used almost anywhere. Below are a few common applications that you might find useful or interesting:

  • Customer Analysis And Segmentation: This is the application we just worked through! Correspondence analysis can help you understand which customer segments are associated with which products or services.
  • Brand Perception: This is a really common and useful one. Think of a chart that plots your company and competitors with associated qualities like "innovative," "trustworthy," "expensive," etc. This can help make sure your brand is positioned where you want it to be.
  • Political Polls: Which party has disproportionately high support among which demographics? Correspondence analysis can help you understand this.
  • UX Design: Which types of users disproportionately take advantage of certain features? What does that mean for building new features? Correspondence analysis can help you understand this as well.

Conclusion

Pivot tables and crosstabs are great for summarizing data, but they can be tricky and hard to reason about when it comes to fully understanding relationships among categories. This can be especially difficult when you have large tables with many categories.

Correspondence analysis is a powerful tool that can help you understand these relationships and make better decisions based on them. You can use it anytime you're summarizing categorical data, so it can be applied in a wide variety of applications.

If you're looking to get more out of your pivot tables and make sure you're not overlooking insight, then correspondence analysis is a great tool to have in your toolbox.