This is another one of those blog posts that I hope will help you do research, it’s not necessarily about DTRAP. It’s also not directly about a cage match, sorry!
If you’ve ever taken statistics, one of the first things you learn is mean, median and mode. I’m going to talk about what they mean, as opposed to how to compute them.
It all goes back to Carl Frederich Gauss. It was a warm summer’s evening when…
Yeah, I won’t do that. What I will talk about is the Normal Distribution, also known as the Gaussian distribution or the Bell Curve.
That’s our friend the normal distribution. One property it has is that the mean, median and the mode have the same value, right there in the center of the distribution. In this example, the mean, median and mode are all 0.
That means the balance of the data is in the center. The mean of a distribution is that place that if you balance the distribution on a point at the mean, half the weight is one one side and half of it is on the other.
The median is where the middle of the data is. It’s a counting exercise that knows nothing about the values in the data itself. The mode is a similar counting exercise. You look for the value in your data with the highest frequency.
In the case of a normal distribution, everything is balanced at the center and we’re all happy about it. But life and data aren’t often normal, so…
What if the mean was 73.15 and the median was 2. What does the data look like then?
In this case, it looks like that picture. The data has a few points which are really large and other than that, they’re mostly kind of small. The large data points skew the mean though.
However, if we look at the median, it’s 2. That means half the data points are 2 or smaller than that and half are larger. But if we balance the data on a stick, that’s going to happen around 73.15 and there’s a lot of data on the left side and not so much on the right.
Mean and median tell us two things about where the center of the data is. One looks at the weight, or balance of the data and the other counts and says “Here, in this spot, this is the middle of your data”.
Which leads me to my next point.
If we rely just on mean and median without actually plotting the data, we don’t actually know what it looks like. A single number doesn’t describe a set of data, as much as we’d like it to.
The two numbers combined can tell us something though, even if we didn’t have that plot. The median of 2 tells me that half the numbers in my data set are small. The mean of 73.15 combined with that tells me there’s some really big numbers in that data because otherwise, I’d expect it to be around 2… if my data was normally distributed.
One number isn’t enough to describe data without more context, keep that in mind.