Simpson's paradox is named after statistician Edward Simpson, who first described the effect in a 1951 paper titled The Interpretation of Interaction in Contingency Tables. Simpson described a scenario where the results of a scientific study appear to support one conclusion, but when the results are split to account for a particular variable, the results show a different conclusion.
Simpson gave the example of a researcher looking at a deck of cards to determine whether the proportion of "court cards" (King, Queen, Jack) is the same among red and black cards. But in this scenario, it happens that the researcher's baby has played with, and slobbered over, some of the cards.
The researcher counts the slobbery cards and the clean cards separately, just in case that factor happens to be relevant, but also looks at the totals for the whole deck. The results are below.
Slobbery | Clean | All Cards | |||||||
---|---|---|---|---|---|---|---|---|---|
Court | Plain | % Court | Court | Plain | % Court | Court | Plain | % Court | |
Red | 4 | 8 | 33.3 % | 2 | 12 | 14.3 % | 6 | 20 | 23.1 % |
Black | 3 | 5 | 37.5 % | 3 | 15 | 16.7 % | 6 | 20 | 23.1 % |
The researcher has found a surprising result. In both the clean and the slobbery sets, the proportion of court cards is higher for black cards than for red. In the full deck, of course, the proportions are equal.
Next, Simpson took the same numbers but applied different labels. Suppose the researcher is evaluating an experimental medical treatment.
Male | Female | All Patients | ||||
---|---|---|---|---|---|---|
Untreated | Treated | Untreated | Treated | Untreated | Treated | |
Alive | 4 | 8 | 2 | 12 | 6 | 20 |
Dead | 3 | 5 | 3 | 15 | 6 | 20 |
I've left the percentages out of this table, but it should be clear that a higher percentage of both males and females receiving the treatment are still alive, compared with those who did not receive the treatment. But looking at the total number of patients, the treatment's advantage seems to disappear.
Was the treatment a success? Simpson argues that we must conclude, based on its effectiveness for both males and females, that it was. But it is critically important that researchers break out the data to control for males vs. females, or they will miss this vital information.
In the mid-1990s this effect appeared in an actual medical study. Researchers looked at the effectiveness of percutaneous nephrolithotomy (PCNL), which involves removing kidney stones through a small puncture in the abdomen, a procedure that was less invasive than open surgery. Gathering records on 700 kidney stone surgeries, they found that PCNL appeared at first glance to be more effective than open surgery. But when the researchers controlled for the size of the stones, they found a surprise.
Open Surgery | PCNL | |||||
---|---|---|---|---|---|---|
Success | Total | % Success | Success | Total | % Success | |
All Stones | 273 | 350 | 78% | 289 | 350 | 83% |
Small Stones (< 2 cm) | 81 | 87 | 93% | 234 | 270 | 87% |
Large Stones (> 2 cm) | 192 | 263 | 73% | 55 | 80 | 69% |
While a higher percentage of patients receiving PCNL had no further troubles with kidney stones, this was not due to the treatment's effectiveness. It was due to the fact that surgeons were more likely to perform PCNL on patients with smaller stones. As a result, PCNL appeared to be more effective than it actually was. In reality, it was the size of the stones, not the type of surgery, that was the biggest determinant of success.
The paradox also showed up in a discrimination lawsuit against the graduate programs of the University of California, Berkeley. The graduate schools had accepted 44% of all male applicants, but only 35% of female applicants. But when researchers broke it down, they couldn't find a department responsible for the discrepancy. A breakdown of the top six departments shows an even greater apparent bias in the aggregate, but no deparment has a statistically significant bias toward males, and some departments actually show a significant bais toward female applicants.
Department | Men | Women | ||
---|---|---|---|---|
Applicants | Admitted | Applicants | Admitted | |
TOTAL | 2691 | 45% | 1835 | 30% |
A | 825 | 62% | 108 | 82% |
B | 560 | 63% | 25 | 68% |
C | 325 | 37% | 593 | 34% |
D | 417 | 33% | 375 | 35% |
E | 191 | 28% | 393 | 24% |
F | 373 | 6% | 341 | 7% |
The appearance of bias in the overall totals is due to the large number of males applying to science and engineering departments (represented by A and B in the chart above). Graduates with degrees in these fields are in high demand, so these departments accept a higher percentage of applicants. Humanities and social science departments draw a higher percentage of female applicants, but also are more selective in who they admit. The result is, with more women applying to the more selective departments, fewer women get admitted. But within any given department, women are not at a disadvantage.
Sometimes when you dig into the details, you don't just get a more complete story. Sometimes you get a different story entirely.
Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer