The Book of Why: The New Science of Cause and Effect
Judea Pearl and Dana Mackenzie
Basic Books, $32 (cloth)
“Correlation is not causation.”
Though true and important, the warning has hardened into the familiarity of a cliché. Stock examples of so-called spurious correlations are now a dime a dozen. As one example goes, a Pacific island tribe believed flea infestations to be good for one’s health because they observed that healthy people had fleas while sick people did not. The correlation is real and robust, but fleas do not cause health, of course: they merely indicate it. Fleas on a fevered body abandon ship and seek a healthier host. One should not seek out and encourage fleas in the quest to ward off sickness.
The rub lies in another observation: that the evidence for causation seems to lie entirely in correlations. But for seeing correlations, we would have no clue about causation. The only reason we discovered that smoking causes lung cancer, for example, is that we observed correlations in that particular circumstance. And thus a puzzle arises: if causation cannot be reduced to correlation, how can correlation serve as evidence of causation?
The Book of Why, co-authored by the computer scientist Judea Pearl and the science writer Dana Mackenzie, sets out to give a new answer to this old question, which has been around—in some form or another, posed by scientists and philosophers alike—at least since the Enlightenment. In 2011 Pearl won the Turing Award, computer science’s highest honor, for “fundamental contributions to artificial intelligence through the development of a calculus of probabilistic and causal reasoning,” and this book sets out to explain what all that means for a general audience, updating his more technical book on the same subject, Causality, published nearly two decades ago. Written in the first person, the new volume mixes theory, history, and memoir, detailing both the technical tools of causal reasoning Pearl has developed as well as the tortuous path by which he arrived at them—all along bucking a scientific establishment that, in his telling, had long ago contented itself with data-crunching analysis of correlations at the expense of investigation of causes. There are nuggets of wisdom and cautionary tales in both these aspects of the book, the scientific as well as the sociological.
Pearl also has one big axe to grind, especially when it comes to the study of human cognition—how we think—and the hype surrounding contemporary artificial intelligence. “Much of this data-centric history still haunts us today,” he writes. It has now been eleven years since Wired magazine announced “the end of theory,” as “the data deluge makes the scientific method obsolete.” Pearl swims strenuously against this tide. “We live in an era that presumes Big Data to be the solution to all our problems,” he says, “but I hope with this book to convince you that data are profoundly dumb.” Data may help us predict what will happen—so well, in fact, that computers can drive cars and beat humans at very sophisticated games of strategy, from chess and Go to Jeopardy!—but even today’s most sophisticated techniques of statistical machine learning can’t make the data tell us why. For Pearl, the missing ingredient is a “model of reality,” which crucially depends on causes. Modern machines, he contends against a chorus of enthusiasts, are nothing like our minds.
To make the stakes clear, consider the following scenario. Suppose there is a robust, statistically significant, and long-term correlation between the color of cars and the annual rate at which they are involved in accidents. To be concrete, assume that red cars, in particular, are involved in accidents year after year at a higher rate than cars of any other color. When you go to buy a new car, should you avoid the color red in your quest to remain safe on the road?
A moment’s reflection suggests many distinct causal mechanisms that could underlie the observed correlation, and each yields different advice. On the one hand, it could be that the human visual system is not as good at gauging the distance and speed of red objects as it is with other colors. In that case, red cars could be involved in more accidents because other drivers tend to misjudge the speed and distance of approaching red cars and so collide with them more often.
On the other hand, the correlation may have nothing at all to do with the dangerousness of the color itself. It could, for example, be the byproduct of a common cause. People who choose red cars may tend to be more adventurous and thrill-seeking than the average driver, and so be involved in proportionally more accidents. Then again, the correlation may have nothing to do with driving abilities at all. People who buy red cars may just enjoy driving more than other people, and spend more hours a year on the road. In that case, one would expect there to be more accidents involving red cars even if the drivers are, on average, more careful and cautious than other drivers.
All of these hypotheses would account for the observed correlation between the color of the car and the rate of accidents. And one can easily think up other hypotheses as well. To make matters worse, the observed correlation could be the product of all of these factors working conjointly. But only the first hypothesis yields the recommendation that one should avoid buying a red car to improve one’s chances of avoiding an accident. In the other cases, the redness itself plays no causal role, but is merely an indicator of something else.
This toy example illustrates the fundamental problem of causal reasoning: How can we find our way through such a thicket of alternative explanations to the causal truth of the matter?
In some cases, the best advice is to look for more correlations, correlations between different variables. To test whether the higher rate of accidents is due to more time on the road, for example, we ought to control for time. If the true cause of the original correlation lies in how much different drivers like to drive rather than in the color itself, then the correlation should vanish when we look at the association between car color and accidents-per-mile-driven or accidents-per-hour-driven, rather than accidents-per-year. This line of thought suggests that the trick to deducing the causal from the correlational is just to comb through a large enough data set for other correlations. According to this operationalist way of thinking, all the answers lie, somehow, in the data. One just has to figure out how to pan through them appropriately to reveal the hidden causal gold nuggets.
• • •
Pearl began his work on artificial intelligence in the 1970s with this mindset, imparted to him by his education. For much of the scientific community throughout the twentieth century, the very idea of causation was considered suspect unless and until it could be translated into the language of pure statistics. The outstanding question was how the translation could be carried out. But step by painful step Pearl discovered that this standard approach was unworkable. Causation really cannot be reduced to correlation, even in large data sets, Pearl came to see. Throwing more computational resources at the problem, as Pearl did in his early work (on “Bayes nets,” which apply Thomas Bayes’s basic rule for updating probabilities in light of new evidence to large sets of interconnected data), will never yield a solution. In short, you will never get causal information out without beginning by putting causal hypotheses in.
This book is the story of how Pearl came to this realization. In its wake, he developed simple but powerful techniques using what he calls “causal graphs” to answer questions about causation, or to determine when such questions cannot be answered from the data at all. The book should be comprehensible to any reader with sufficient interest to pause over some formulas to digest their conceptual meaning (though the precise details will require some effort even by those with background in probability theory). The good news is that the main innovation that Pearl is advertising—the use of causal hypotheses—gets couched not so much in algebra-laden statistics as in visually intuitive pictures: “directed graphs” that illustrate possible causal structures, with arrows pointing from postulated causes to effects. A good deal of the book’s argument can be grasped simply by attending only to these diagrams and the various paths through them.
Consider two basic building blocks of such graphs. If two arrows emerge from a single node, then we have a “common-causal fork,” which can produce statistical correlations between properties that are not, themselves, causally related (such as car color and accident rate on the reckless-drivers-tend-to-like-the-color-red hypothesis). In this scenario, A may cause both B and C, but B and C are not causally related. On the other hand, if two different arrows go into the same node then we have a “collider,” and that raises an entirely different set of methodological issues. In this case, A and B may jointly cause C, but A and B are not causally related. The distinction between these two structures has important consequences for causal reasoning. While controlling for a common cause can eliminate misleading correlations, for example, controlling for a collider can create them. As Pearl shows, the general analytic approach, given a certain causal model, is to identify both “back door” (common cause) and “front door” (collider) paths that connect nodes and take appropriate cautions in each case.
Here are some simple examples. We know there is a positive correlation between a car being red and it being involved in an accident in a given year. But we don’t know whether the redness of the car makes it more dangerous. So we start by thinking up various causal hypotheses and representing them by directed graphs: nodes connected by arrows.
On one hypothesis, the redness is a cause of the accidents, so we draw an arrow from the node “red car” to the node “accident.” End of story.
On another hypothesis, some personality trait is the cause of both buying red cars and driving more, and driving more is the cause of more accidents (per year). The causal graph has arrows from the “personality” node to both the “red car” node and the “more driving” node (so that “personality” is a common-cause fork), and there is a further arrow from “more driving” to “accident.” The personality trait only indirectly causes accidents. There is still a connected path in this diagram from “red car” to “accident” which can explain the correlation, but it is a backdoor path: it passes through a common cause (“personality”). We can test this hypothesis by controlling either for “personality” (which may be an unknown trait) or for “more driving” (which can be measured). If the correlation still persists when either of these is controlled for, then we know that this causal structure is wrong.
But how do we decide which causal models to test in the first place? For Pearl, they are provided by the theorist on the basis of background information, plausible conjectures, or even blind guesses, rather than being derived from the data. The method of causal graphs allows us to test the hypotheses, both by themselves and against each other, by appeal to the data; it does not tell us which hypotheses to test. (“We collect data only after we posit the causal model,” Pearl insists, “after we state the scientific query we wish to answer. . . . This contrasts with the traditional statistical approach . . . which does not even have a causal model.”) Sometimes the data may refute a theory. Sometimes we find that none of the data we have at hand can decide between a pair of competing causal hypotheses, but new data we could acquire would allow us to do so. And sometimes we find that no data at all can serve to distinguish the hypotheses.
Although this method for using hypothetical causal structures to tease out causal conclusions from statistical data is remarkably simple—and Pearl and Mackenzie give the reader examples that can be solved like logic puzzles—Pearl’s route to these methods was difficult and circuitous. The main problem was simple: as they narrate it, the entire field of statistics had sworn off explicit talk about causation altogether, so Pearl’s approach required swimming against the stream of “common wisdom” in the field. (Some statisticians, it should be noted, have disputed this characterization of the field’s history.) His divergence from the mainstream began in the late 1980s and early ’90s, and he recounts his intellectual and institutional struggles with justifiable pride.
It is an old and familiar story. Accounts of the world, both scientific and “common sense,” postulate all kinds of things—entities and laws and structures—that are not immediately observable. But the data against which such a theory is evaluated must be observable: that is what makes them data, after all—they are what is given to us by experience. Hence a gap opens up between what we believe in (the theory) and the grounds we have to believe it (the data). The gap—what philosophers have called the “underdetermination of theory by evidence”—means that all theories are fallible: the data cannot entail that the theory is correct. Some particularly sensitive souls find this epistemic gap intolerable; as a result, many sciences have recurrent movements to purge “theoretical” postulates altogether and somehow frame the science as statements about the structure of the observable data alone.
This back-to-the-data approach has been tried many times—think behaviorism in psychology and positivism in physics—and has failed just as many. In statistics, the form it took, according to Pearl, was a renunciation of all talk of causation—since, as the Enlightenment philosopher David Hume pointed out in the seventeenth century, a causal connection between events is not itself immediately observable. As Hume put it, we can observe conjunction of events—that one sort of event consistently follows another, for example—but not causal connexion. The positivist’s response: all the worse for causation! Thus, Pearl says, “In vain will you search the index of a statistics textbook for an entry on ‘cause.’ Students are not allowed to say that X is the cause of Y—only that X and Y are ‘related’ or ‘associated.’”
But what we commonly care about is effective interventions in the world, and which interventions will be effective depends on the causal structure. If the redness of a car is a cause of its being involved in accidents because red is harder to accurately see, then one will be safer buying a car of a different color. If the correlation is merely due to a common cause, such as the psychology of the buyer, then you might as well go with the color you prefer. Avoiding the red car will not magically make you a better driver. Unsurprisingly, trying to suppress talk of causation—by contenting ourselves with discussion of correlations—left the field of statistical analysis in a mess.
To be fair, the field hadn’t suppressed such talk entirely. It had just been relegated mostly to the more specialized domain of “experiments” (as opposed to “observational studies”), a subject above and beyond ordinary statistical analysis—which, on its own, can’t lead to conclusions about causes.
Indeed there is one universally recognized situation in which observed correlation is accepted as proof of causation: the randomized controlled trial (RCT). Suppose we take a large pool of car buyers and randomly sort them into two groups, the experimental group and the control group. We then force the experimental group to drive red cars and forbid the control group from doing so. Since the groups were formed by random chance, it is overwhelmingly likely (if the groups are large enough) that they will be statistically similar in all respects, both known and unknown. About the same percentage of each group, for example, will be reckless drivers. If the number of accidents in the experimental group exceeds that in the control group by a statistically significant amount, we have the “gold standard” proof that the color itself causes accidents.
Well, there are some caveats even here. The real gold standard is a double-blind experiment, in which neither the subjects nor the experimenters know who is in which group. In the case of car color, we would literally have to blind the drivers, which would of course raise the accident rate considerably. But let’s leave that wrinkle behind.
The key to a RCT is that by assigning members to the two groups, rather than allowing them to self-select, we control for alternative explanations. As Pearl puts it, in such a randomized design one “erases the incoming arrows” to the value of the experimental variable, in this case “red car” or “not red car.” Of course, it is not that the placement of a subject into one of the two groups is literally uncaused: it may, for example, be determined by the throw of a die or the value of the output of a random number generator. It is rather that the particular constellation of causes that determines the placement will not plausibly have any other notable effect.
Pearl’s formal apparatus recognizes this sort of situation by what he calls the “Do operator,” “Do x” indicates an intervention that makes x the case, as opposed to the mere observation that x is the case. If I just open my eyes at the traffic streaming past me, I can record who is driving a red car and who is not. But “Do red car” would require that I myself (or some other randomizing device) make it the case that a person is driving a red car. That is precisely the difference between a mere observational study, which watches but does not interfere, and an RCT. (There is enough technical detail in the book about how the Do calculus works for someone familiar with statistical methods to work out the details, but the presentation can also be read and appreciated at a more purely graphical level.)
Pearl does not dispute the evidential value of RCTs. But they are costly and difficult and sometimes unethical. The best evidence that smoking causes cancer in humans would come from an experiment that randomly divided a large group in infants into two groups, forcing one group to smoke two packs a day and preventing the other group from smoking. But such an experiment would obviously be morally impermissible.
One of Pearl’s major contributions is the development of the “Do calculus.” What he and his students and co-workers showed is that if one starts out with an accurate graphical model of the causal structure of a situation—arrows showing which variables might be causes of others—then in some situations one can reduce “Do” claims to merely observational claims. That is, appropriate passive observational data can provide the same sort of evidence as an RCT—assuming, of course, that the initial causal model is accurate. The advantage of an RCT is that it provides its evidence of causation without the need for any initial causal hypothesis. The advantage of the Do calculus is that it can provide equally strong tests of causal hypotheses without the need for intervention.
The last part of the book enters more philosophical territory. Pearl describes the transition from the mere observation of correlations to the testing of Do claims as the ascent from Rung One to Rung Two of the ladder of causation. The difference is the difference between merely noting a correlation in the data and coming to a conclusion about causal structure. But—and now the situation becomes rather convoluted—Pearl also insists that there is yet a higher destination: Rung Three, which involves reasoning about counterfactuals.
A counterfactual makes an assertion about what would have happened had the world been other than it is in some way. As an example, consider the claim, “If Oswald had not shot Kennedy, someone else would have.” This statement takes for granted that Oswald did indeed shoot Kennedy, and makes a claim about how things would have gone had he not done so. We may not have reason to think that this counterfactual is true, but it is easy enough to imagine situations in which it would be. For example, if there were a second assassin hidden in the grassy knoll whose job was to act as a back-up in case Oswald failed.
In a certain sense, counterfactuals are about fictional worlds or unrealized possibilities because their antecedents are contrary-to-fact: they are about what might have been but wasn’t. So on the face of it they seem to be beyond the reach and scope of normal scientific inquiry. After all, no telescope is strong enough to reveal what might have been. Indeed, it is easy to fall into the opinion that counterfactuals exceed the scope of science altogether. As the physicist Asher Peres once said, unperformed experiments have no outcomes. So what are they doing in Pearl’s book?
The answer is that Pearl seems to think they are loaded with philosophical significance: in his telling, consideration of counterfactuals is of a cognitively higher order than consideration of causal claims. Many non-human animals can engage in causal thought, he argues, but perhaps only humans and some few very advanced other animals can entertain counterfactuals. Ascending to the Third Rung, like eating the fruit of the Tree of Knowledge, sets humans apart from the rest of the animal kingdom. He devotes the last chapters of the book to counterfactuals and the grounds we can have to believe them.
But this wrenching apart of causation and counterfactual reasoning is a mistake. Counterfactuals are so closely entwined with causal claims that it is not possible to think causally but not counterfactually. This fact has often been overlooked or ignored by philosophers, so it is not much of a surprise to see Pearl fall into the same trap.
Imagine you are holding a precious and fragile Tiffany lamp over a hard stone floor. A fly is buzzing near your head, annoying you and you wish for the buzzing to stop. What should you do? Well, you could let go of the lamp in order to swat at the fly. But as a causal reasoner you can foresee what the outcome of that would be: you might kill the fly, but the lamp would fall to the floor and shatter. Bad outcome. So you don’t let go of the lamp after all. But having accepted the causal connection between dropping the lamp and it shattering, and assuming you don’t in fact drop the lamp, you are then committed, willy-nilly, to the counterfactual “Had I let go of the lamp it would have shattered.” So you can’t get to Rung Two without being able to handle Rung Three as well.
The failure to appreciate these connections between causal talk and counterfactual talk makes the later chapters of the book murkier than the preceding ones. Pearl is on firm ground discussing causation, and gets a little bollixed up trying to make more rungs to his ladder than there are. This same imprecision and lack of familiarity with the philosophical literature suffuses his discussion of free will at the very end of the book.
These aspects of The Book of Why raise interesting questions about the role of philosophy in Pearl’s career, and in science more generally. Reading this book as a philosopher, I find there is much to be gratified by. Pearl has read and appreciated philosophical discussions of causation and counterfactuals. The book cites David Hume and David Lewis and Hans Reichenbach and other philosophers. Compared to the usual disdain that scientists display toward philosophy, Pearl’s attitude is a beacon of hope.
But the book is also a cautionary tale. A standard training in statistics had adjured Pearl to avoid causal talk altogether in favor of mere correlations. He had to swim against the stream both to recover work by Sewall Wright on “linear path analysis” from the 1920s and to push the ideas forward. By contrast, I was a graduate student in History and Philosophy of Science at the University of Pittsburgh from 1980 to 1986, and I can attest that the conceptual issues that Pearl contended with were our bread and butter. Of course we read all the philosophers he did and then some. But in addition, Ken Schaffner was teaching about path analysis and Sewall Wright in the context of medical research. And more importantly, Clark Glymour was hard at work, with his students Peter Spirtes and Richard Scheines, on the Tetrad computer program for statistical tests of causal models, and the work that was published in 2001 as Causation, Prediction and Search.
Pearl could have saved himself literally years of effort had he been apprised of this work. He acknowledges in the book that he learned from Peter Spirtes to think of forced interventions as erasing causal arrows, but given my own background I could not but wonder how much farther Pearl would have gotten had he had the training I did as a philosopher. The physicist Richard Feynman is widely reported as saying “Philosophy of science is about as useful to scientists as ornithology is to birds.” It always surprises me that no one points out that ornithology would indeed be a great use to birds—if they could ask the ornithologists for advice, and if they could understand it.
The Book of Why provides a splendid overview of the state of the art in causal analysis. It forcefully argues that developing well-supported causal hypotheses about the world is both essential and difficult. Difficult, because causal conclusions do not flow from observed statistical regularities alone, no matter how big the data set. Rather, we must use all our clues and imagination to create plausible causal models, and then analyze those models to see whether, and how, they can be tested by data. Just crunching more numbers is not the royal road to causal insight.
But why care about causes? One reason is pure scientific curiosity: we want to understand the world, and part of that requires figuring out its hidden causal structure. But just as important, we are not mere passive observers of the world: we are also agents. We want to know how to effectively intervene in the world to prevent disaster and promote well-being. Good intentions alone are not enough. We also need insight into how the springs and forces of nature are interconnected. So ultimately, the why of the world must be deciphered if we are to understand the how of successful action.