You are here

Simpson's Paradox

Simpson's paradox is named after statistician Edward Simpson, who first described the effect in a 1951 paper titled The Interpretation of Interaction in Contingency Tables. Simpson described a scenario where the results of a scientific study appear to support one conclusion, but when the results are split to account for a particular variable, the results show a different conclusion.

Simpson gave the example of a researcher looking at a deck of cards to determine whether the proportion of "court cards" (King, Queen, Jack) is the same among red and black cards. But in this scenario, it happens that the researcher's baby has played with, and slobbered over, some of the cards.

The researcher counts the slobbery cards and the clean cards separately, just in case that factor happens to be relevant, but also looks at the totals for the whole deck. The results are below.

  Slobbery Clean All Cards
  Court Plain % Court Court Plain % Court Court Plain % Court
Red 4 8 33.3 % 2 12 14.3 % 6 20 23.1 %
Black 3 5 37.5 % 3 15 16.7 % 6 20 23.1 %

The researcher has found a surprising result. In both the clean and the slobbery sets, the proportion of court cards is higher for black cards than for red. In the full deck, of course, the proportions are equal.

Next, Simpson took the same numbers but applied different labels. Suppose the researcher is evaluating an experimental medical treatment.

  Male Female All Patients
  Untreated Treated Untreated Treated Untreated Treated
Alive 4 8 2 12 6 20
Dead 3 5 3 15 6 20

I've left the percentages out of this table, but it should be clear that a higher percentage of both males and females receiving the treatment are still alive, compared with those who did not receive the treatment. But looking at the total number of patients, the treatment's advantage seems to disappear.

Was the treatment a success? Simpson argues that we must conclude, based on its effectiveness for both males and females, that it was. But it is critically important that researchers break out the data to control for males vs. females, or they will miss this vital information.

In the mid-1990s this effect appeared in an actual medical study. Researchers looked at the effectiveness of percutaneous nephrolithotomy (PCNL), which involves removing kidney stones through a small puncture in the abdomen, a procedure that was less invasive than open surgery. Gathering records on 700 kidney stone surgeries, they found that PCNL appeared at first glance to be more effective than open surgery. But when the researchers controlled for the size of the stones, they found a surprise.

  Open Surgery PCNL
  Success Total % Success Success Total % Success
All Stones 273 350 78% 289 350 83%
Small Stones (< 2 cm) 81 87 93% 234 270 87%
Large Stones (> 2 cm) 192 263 73% 55 80 69%

While a higher percentage of patients receiving PCNL had no further troubles with kidney stones, this was not due to the treatment's effectiveness. It was due to the fact that surgeons were more likely to perform PCNL on patients with smaller stones. As a result, PCNL appeared to be more effective than it actually was. In reality, it was the size of the stones, not the type of surgery, that was the biggest determinant of success.

The paradox also showed up in a discrimination lawsuit against the graduate programs of the University of California, Berkeley. The graduate schools had accepted 44% of all male applicants, but only 35% of female applicants. But when researchers broke it down, they couldn't find a department responsible for the discrepancy. A breakdown of the top six departments shows an even greater apparent bias in the aggregate, but no deparment has a statistically significant bias toward males, and some departments actually show a significant bais toward female applicants.

Department Men Women
Applicants Admitted Applicants Admitted
TOTAL 2691 45% 1835 30%
A 825 62% 108 82%
B 560 63% 25 68%
C 325 37% 593 34%
D 417 33% 375 35%
E 191 28% 393 24%
F 373 6% 341 7%

The appearance of bias in the overall totals is due to the large number of males applying to science and engineering departments (represented by A and B in the chart above). Graduates with degrees in these fields are in high demand, so these departments accept a higher percentage of applicants. Humanities and social science departments draw a higher percentage of female applicants, but also are more selective in who they admit. The result is, with more women applying to the more selective departments, fewer women get admitted. But within any given department, women are not at a disadvantage.

Sometimes when you dig into the details, you don't just get a more complete story. Sometimes you get a different story entirely.

543 users have voted.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer