Have you ever noticed how many companies rely on average metrics to measure their success? We see it all the time – from dashboards to reports, everything revolves around numbers like the average order value and the average customer lifetime value.
But here’s the thing: relying solely on the average can be a dangerous game. Why, you ask? Well, the value of the average is heavily influenced by the minimum and maximum values in the dataset.
To give you an example, let’s say you run a business where customers make daily deposits. For our sample dataset, the average deposit amount is $61.00, and you might be tempted to run promotions based on that number. Maybe you’ll offer a bonus to customers who deposit more than the average amount – sounds like a great idea, right?
But hold on a minute. If you do that, you could end up tanking your profits or even going out of business. Why?
Because more than half of the users deposit less than the average amount. Here’s how to know it:
When dealing with users, visitors, or customers, the median is a better metric than the average. Instead of simply adding up all the values and dividing by the count, the median takes the centered value in the ordered range. This provides a more accurate representation of the data.
For example, consider the dataset: 4, 5, 6, 8, 12, 15, 18. The median is 8.
Using the median allows us to better understand the dataset mentioned earlier. In this case, the median for the dataset is $9.76.
What does the Median show? It presents that 50% of customers make deposits with amounts not higher than $9.76.
In this case, you understand much better what’s happening with your customers’ deposits and what kind of promotional offer you can develop to improve the daily transactions.
The second solution is Quartiles. Quartiles are a way of dividing a set of data into four equal parts.
To find the quartiles, we first sort the data from smallest to largest and then find the median, which is the middle value. The median is also called the second quartile, or Q2. The first quartile, or Q1, is the median of the lower half of the data, and the third quartile, or Q3, is the median of the upper half of the data.
The quartiles help us describe the spread and variability of the data, as well as identify any outliers or extreme values. For example, the interquartile range, or IQR, is the difference between Q3 and Q1, and it measures how far the middle 50% of the data is spread out.
The quartiles for the discussed datasets are as follows:
– Quartile 1: $3.68
– Quartile 2: $9.76
– Quartile 3: $27.28
To interpret this data, you can use the following guide:
– 25% of customers make deposits below $3.68
– 50% of customers make deposits below $9.76
– 75% of customers make deposits below $27.28
The third solution is very similar to the one we discussed earlier, Quartiles. However, there is one primary difference. This method allows you to create customer “quartiles” that are called percentiles.
To divide a set of data into 100 equal parts, based on the rank order of the values, we use a method called percentiles. For example, the 90th percentile is the value that separates the lowest 90% of the data from the highest 10%.
To find the percentiles of a data set, we first sort the data from smallest to largest. Then, we use a formula or a graph to locate the desired percentile.
The 25th percentile is Quartile 1, the 50th percentile is Quartile 2 (also known as Median), and the 75th percentile is Quartile 3.
For example, the deposits dataset’s 95th percentile is $202.70. This means that 95% of customers deposit less than this amount.
Last but not least: Maximum and Minimum
In order to gain a better understanding of a dataset, one solution is to find its maximum and minimum values.
To determine the maximum and minimum values of a dataset, we can arrange the data from smallest to largest and then examine the first and last values.
The maximum and minimum values of a dataset represent the highest and lowest values in the data, respectively. They are valuable in describing the range and variance of the data, as well as identifying any outliers or extreme values.
For example, in our dataset, the minimum value is $0.54 and the maximum value is $33,849.00.
Simply by understanding these values, we can recognize that there may be situations where the average does not accurately capture the data.
Bonus: Whisker Plots
One of the most effective ways to visualize various aforementioned metrics is by using whisker plots, which are also known as box plots. These plots are a type of graphical display that demonstrate the distribution and variability of a numerical dataset.
Whisker plots consist of a box that represents the middle 50% of the data, i.e., data between quartile 1 and quartile 3.
There are two lines, or whiskers, that extend from the box to the minimum and maximum values of the data, excluding any outliers.
The median is shown as a line inside the box, and any extreme values that are far from the rest of the data are represented by dots or stars beyond the whiskers.
Whisker plots are useful in comparing different data sets, identifying the skewness or symmetry of the data, and measuring the spread or range of the data.
You can see how whisker plots look for the sample data we used in this example below. It clearly displays multiple metrics in one view, allowing you to make much better decisions around potential promotions, life-cycle optimization, and other business areas.
To put it simply, using the average to understand your data may not always be accurate. In most cases, it’s best to rely on other metrics such as the median, quartiles, percentiles, and minimum and maximum values. A helpful way to visualize all these metrics in a single chart is by using Whisker Plots.