Regression to the Mean | Definition & Examples
Regression to the mean (RTM) is a statistical phenomenon describing how variables much higher or lower than the mean are often much closer to the mean when measured a second time.
Regression to the mean is due to natural variation or chance. It can be observed in everyday life, particularly in research that intentionally focuses on the most extreme cases or events. It is sometimes also called regression toward the mean.
Regression to the mean is common in repeated measurements (within-subject designs) and should always be considered as a possible cause of an observed change. It is considered a type of information bias and can distort research findings.
What is regression to the mean?
Regression to the mean is observed when variables that are extremely higher or extremely lower than average on the first measurement move closer to the average on the second measurement.
In general, RTM explains why unusual events are likely to be followed by more typical ones. Suppose that a company has a great quarter, exceeding all targets set. As exceptional performance is difficult to maintain over time, there is a high chance of worse performance in the next quarter, shifting the performance of the company back towards the mean. Anything that can be influenced by an element of chance is subject to this phenomenon.
Regression to the mean occurs when a nonrandom sample is selected from a population and you measure two imperfectly correlated variables, such as two consecutive blood pressure measurements.
- The smaller the correlation between the two variables, the larger the effect of RTM.
- The more extreme the value from the population mean, the more room there is to regress to the mean.
Regression to the mean can be explained by considering, for example, that skill and performance are imperfectly correlated due to the role of luck. This may lead you to find a causal relationship where there isn’t one.
Why is regression to the mean a problem?
Regression to the mean can prove problematic particularly in research studies that measure the effectiveness of an intervention, program, or policy.
It can mislead researchers to believe that an intervention is the cause of an observed change, when in reality it is due to chance. This is particularly evident when researchers focus on measurements of people, cases, or organizations at the extremes, such as the worst-performing, the best-educated, or the unhealthiest.
RTM shows us that, statistically, the lowest cases are likely to improve the second time, while those at their peak will likely perform worse even without the intervention. Because it can distort results, you need to take regression to the mean into account when designing your research as well as when analyzing your findings.
Otherwise, you run the risk of attributing certain results to a particular cause, when in reality they were most likely due to chance.
Regression to the mean examples
Regression to the mean often happens when measuring the effects of an intervention.
Relatedly, randomized evaluations are essential in avoiding regression to the mean when estimating the effects of an intervention.
How to avoid regression to the mean
The best way to avoid regression to the mean is to account for it during the design phase of your research. Whenever possible, use a probability sampling method. Otherwise, your results may lean towards the extremes, either abnormally high or abnormally low for the average. These will, by design, regress towards the mean upon retesting.
In experimental designs, it is important to use a control group and a treatment group, randomly assigning participants to each one. Changes in the control group can help you evaluate the extent of changes caused by regression to the mean, in addition to any placebo effect. Any extra improvement or deterioration in the experimental group compared to the control group can be attributed to the intervention, so long as it is statistically significant.
Calculating the percent of regression to the mean
Alternatively, you can calculate the percent of regression to the mean during your data analysis. You can use the formula below to calculate regression to the mean.
Frequently asked questions
- What are the main types of information bias?
-
Information bias is a general term describing various forms of research bias arising due to systematic measurement error. The main types of information bias are:
- Recall bias
- Observer bias
- Performance bias
- Regression to the mean (RTM)
- What is a correlation coefficient?
-
A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.
Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions. The Pearson product-moment correlation coefficient (Pearson’s r) is commonly used to assess a linear relationship between two quantitative variables.
- Why doesn’t correlation imply causation?
-
The third variable and directionality problems are two main reasons why correlation isn’t causation.
The third variable problem means that a confounding variable affects both variables to make them seem causally related when they are not.
The directionality problem is when two variables correlate and might actually have a causal relationship, but it’s impossible to conclude which variable causes changes in the other.
Sources in this article
We strongly encourage students to use sources in their work. You can cite our article (APA Style) or take a deep dive into the articles below.
This Scribbr article