Regression To The Mean

regression-mean-graph-wisdom-science

Takeaway Points:

  • Regression to the mean is an important concept in statistics that helps to explain why a small amount of data is unreliable, but gathering more data allows us to get a much better picture of how things work.

  • Many people intuitively misunderstand this concept in real life, causing them to make false assumptions of causation when there is none, or to become discouraged by streaks of bad luck.

  • The more info you have, the more accurate a picture you can make - so long as you're focused on relevant data.


One commonly misunderstood concept in statistics, which has huge application to many other fields, is the concept of regression to the mean.

What this means is that when you take any unusual piece of data - which is significantly far away from the average - then the general tendency will be for this data to be “more normal” in the future. In short, things will always tend to "correct" themselves toward the average in the long run.

Regression to the mean can be found basically everywhere.

An example from medicine: Most illnesses, diseases, and injuries represent an outlier data point where you are significantly off of your average health. This means that, so long as your condition isn’t getting worse (and you don’t die), we will expect that, over time, you will generally trend towards being healthier as your body continues to fight the illness or repair the injury.

While this may seem so simple as to be uninteresting (if you’re not getting worse, you’re getting better), it has a lot of implications that people DON’T think about when they’re sick. For example, let’s say that you’re sick and you take a pill that a magician tells you will cure you. Within a few days, you’re cured!

However, chances are that it wasn’t the pill - it was just regression to the mean. To determine whether the pill really works, you would need to test it against other pills and placebos, by taking a variety of people with the same sickness and giving them different treatments (aka, perform some scientific studies). You need to see whether people tend to recover faster on average with the given cure, and whether or not this outperforms placebo.

Sore from exercise? Get a massage, that will help! (Ignores the problem that the mean is not to be sore, and that your body is always trending towards the mean when recovering from exercise.)

From these examples you can see a basic trend - if you're significantly "overperforming", then you can expect that regression to the mean means that you'll be balanced out by bad luck in the future. On the other hand, if you're significantly "underperforming" (as in injury, soreness, or illness), then you can expect that this will be balanced out by better performance in the future.

This can be a problem when streaks of good luck encourage us to take risky moves that assume our luck will continue, and when streaks of bad luck discourage us from continuing because we assume that our bad luck will continue.

In general, the more data you have about something, the more the average result comes to resemble whatever the actual average result is.

A coin has a 50% chance of flipping heads, and a 50% chance of flipping tails. Luckily, we know what the actual average result should be beforehand, which is not common with many of the actions we take in real life. So, if we flip the coin just once, then we’ll get very flawed data - if we had only a single coin flip to test, we might think that our coin “always flips heads” or “always flips tails”, and this would be a true assumption based on the data.

But when you flip the coin many more times (10, 100, 1000), you have a lot more data - and the actual heads/tails ratio is progressively more likely to be close to that ideal 50/50 ratio, even if there's sometimes big variations in smaller subsets of the data.

Likewise, the average roll of a six sided dice is 3.5 (1+6/2). A single roll of the dice may provide you with much different numbers, but over time, with enough rolls, the average value will be much closer to 3.5.

To nerd out for a second, this is why a fireball in Dungeons and Dragons (which hits an enemy for a total damage output of 1 six sided dice per character level), is somewhat unreliable at earlier levels (a single roll of a six sided dice can produce very different results), but the damage becomes much more consistent at higher levels (at character level 20, 20 rolls of a six sided dice will tend to have a much more average result). This is called the law of large numbers.

There are many instances in which regression to the mean plays out in our everyday life. Anywhere where events occur multiple times, regression to the mean comes into play.

This is why a weekly rolling average of your weight is a more accurate picture of your weight loss/gain than simply taking your weight periodically and trying to guess where it’s going from that. This is why when we have very little scientific data on a subject, the data can be very easily misused to say things that the data doesn’t support - but why, when we have a lot more data, the truth becomes much more clear. It’s why many of us are tricked into believing in fake cures and create false causes between two sequential events. This is why the stock market is very volatile in the short term, but somewhat predictable in the long run. It's why you should always be cautious about things you don’t know much about. It’s why bad businesses can often thrive in the short term, but tend to fall apart in the long term.

Reserving judgment until you’ve gathered more data is the essence of understanding regression to the mean - and the hallmark of a good mind. Learn to practice seeing regression to the mean when examining the world around you.


Enjoy this post? Share the gains!



Ready to be your best self? Check out the Better book series, or download the sample chapters by signing up for our mailing list. Signing up for the mailing list also gets you two free exercise programs: GAINS, a well-rounded program for beginners, and Deadlift Every Day, an elite program for maximizing your strength with high frequency deadlifting.

Interested in coaching to maximize your results? Inquire here.

Some of the links in this post may be affiliate links. For more info, check out my affiliate disclosure.

Previous
Previous

Warming Up Properly For Heavy Lifting

Next
Next

The Rest Period Broscience Is Wrong