Many times I notice people mistake regression to mean as the cause of an effect. It is critical to differentiate else wrong conclusions could be made.
When I read about it a year or so back, It changed the way I looked at data and the world. So a small post around the same.
What is it?
Regression to the mean is a common statistical phenomenon that can influence or misguide us when we observe the world. Learning to recognize when it is at play can help us avoid seeing patterns that don’t exist.
In technical terms, it describes how a random variable that is outside the normal eventually tends to return to the normal or that “extreme” outcomes tend to be followed by more “normal” ones.
In its simplest form, this means that everything eventually balances out; periods of success or high will be followed by periods of failures or lows such that overall, everything will even out.
Why does it happen?
Most often it occurs due to sampling error. A good sampling technique is to randomly sample from the whole population. In asymmetrical sampling, the results may be abnormally high or low for the average and eventually would regress back to the mean. This phenomenon will not be noticed if the sample size is large enough.
What does it mean in real life?
I have seen serious implications of this in hiring, performance evaluation, business decisions, and other areas. We can notice this phenomenon in sports, trading and almost every aspect of our life.
We, as humans want to attribute success to talent and not to luck.
Luck plays a large role in every story of success; it is almost always easy to identify a small change in the story that would have turned a remarkable achievement into a mediocre outcome. – Daniel Kahneman
During hiring, We must rely on track records, consistency in past experience rather than the outcome of a specific question during interview. Often Interviews focus on a single data point for skills or criteria for hiring which often leads to insufficient data point for making a hiring decisions.
During assessing the performance of an individual or team, We should assess the performance over the whole year rather than falling prey to recency bias. As we have seen, a single negative or positive performance in recent times does not guarantee the consistent future performance.
This can also be seen in business or product decisions where someone takes directionally motivated decisions and forgets correlation does not imply causation. So in product or business decision, one must question the inferences made and understand the correlation factors.
How can we identify it?
To understand regression to the mean, we must first understand the correlation.
Daniel Kahneman observed a general rule:
Whenever the correlation between two scores is imperfect, there will be regression to the mean.
How to avoid it?
Creating a control group can easily solve it in a product, but one should be careful not to fall prey to directional motivation and should capture sufficiently large data points. The Control group is expected to improve or degrade by regression alone.
But in the case of people’s performance, the only real benchmark that can be used is past performance as a control group is not feasible. This makes the effects of regression to be difficult or impossible to deduce. So observe and assess people over a longer period over multiple tasks or projects.
To conclude, if there is one statistical concept which everyone should be aware of, I feel it should be regression to mean. The most important aspect of regression is learning the importance of track records rather than relying on one-off success stories. Knowing about regression to mean in itself is a good step to understand luck vs performance.