Perhaps the most important contribution of behavioral economics has been the demonstration that small changes in incentives, environment, or available information can lead to large changes in behavior. Glennerster and Kremer make a case for harnessing this insight for better understanding what works—and what does not—in development. Their article provides a sampling of evidence for the significant improvements in well-being that can be achieved by seemingly trivial adjustments in the cost of sending children to school, treating drinking water with chlorine, using mosquito nets, immunizing one’s children, getting tested for HIV, and engaging in other welfare-improving activities.

The research that Glennerster and Kremer describe represents part of a historic shift in the source of development policy from abstract theory to observed behavior. Underlying this shift is the assertion that we will do more good in the world if we base our interventions on carefully collected evidence about how people actually behave rather than on assumptions about how they ought to.

This makes a lot of sense, but it puts a tremendous burden on the methodology through which we generate our evidence about why people do what they do. After all, if our interventions are to be based on claims about how people actually behave, then we need to be certain that we can measure with confidence what they are doing and understand why they are doing it.

The methodology of choice for generating such evidence is randomized impact evaluations. Once a novelty in development economics, randomized evaluations are now a staple of the discipline. Although extremely rigorous, their weakness lies in their external validity, that is, in the ability to generalize a study’s findings to other contexts. Critics often worry about the wholesale reshaping of development policies based on the impact of a particular intervention in a handful of villages in a handful of countries—however precisely measured that impact may be. Such detractors would like to see the same intervention generate the same results in a dozen settings before calling for a redirection of development strategy. This, of course, is an argument for doing more randomized studies, not fewer.

But there is another way of thinking about the external-validity critique, which has implications not for how many randomized evaluations we should be doing but for how we should be designing them. At its core, the external-validity critique rests on the claim that context matters. Sometimes the charge is just a thinly veiled attack on the whole enterprise of generalization, but it can also be a call to take seriously the environmental factors that condition the behavior of the people being studied. Seen this way, recognizing the importance of context is not just an admonishment to be modest in the claims we make about general patterns, but also a call for a deeper understanding of the mechanisms that generate the outcomes we seek to affect, along with a recognition that these mechanisms may be situationally dependent.

In nearly all of the studies that Glennerster and Kremer describe, the unit of analysis is the individual or household. But for a large number of issues of interest to development economists, the community is also a relevant unit, either because we are interested in community-level outcomes per se or because we cannot really understand the individual or household-level response apart from the community.

A growing number of studies show that individual behavior is strongly shaped by peer effects and the social pressure exerted by fellow community members. Glennerster and Kremer describe one such study, in which villagers were much more likely to chlorinate their water when the chlorine was provided in a large public dispenser adjacent to the water tap than they were when it was provided in small bottles and added in the privacy of the household. Glennerster and Kremer surmise that there was something about seeing one’s neighbor add chlorine to her water, and wondering what she might think if she sees that you don’t, that generated a sharp (and beneficial) change in behavior. The willingness of families to vaccinate their children or send them to school, or of individuals to get HIV tests may be similarly related to whether others in the community are willing to do these things. The likelihood that a person will attend a community meeting or volunteer her labor to a collective activity may also depend on her assessment of whether others in the community will do so, too.

If these assessments vary with the characteristics of the community—its density of networks, its periodicity of interactions, its ethnic homogeneity, its social and cultural norms—then we cannot hope to understand individual behavior apart from the characteristics of the community itself, and our research designs must take account of this. Randomization across individuals or households but not community-level characteristics—often held constant in experiments—leaves our findings vulnerable to the omission of consequential causal factors. What we need, therefore, are not just more randomized studies of all sorts, but replications of existing studies, carried out in settings selected for the variation they offer in community-level characteristics. This way we can target research to contextual factors that we hypothesize will shape the peer effects that, in turn, affect the individual behaviors we hope to alter.

Doing this is not easy, and it will not be cheap. The cost of additional villages (or districts or countries) is far greater than the cost of additional households. And because of the practical limits imposed by such cost considerations, additional villages must be selected carefully, on the basis of theories about peer effects, rather than on blind randomization. The greater expense and complexity of this process is inescapable. Altering our research designs to take context explicitly into account is necessary if we seek to draw policy lessons from our findings—or, at any rate, if we want to do so responsibly.