As Abhijit Banerjee explains, randomized experiments solve a major problem: how to cleanly identify the effects of a given development program or project. They can thereby make interventions more cost effective and bolster political support for aid. Donor institutions and governments in both wealthy and poor countries have relied too little on the powerful tool of randomization. But with better baseline data and greater attention to results, that is now changing.

This is particularly true in the areas where an experimental design can most usefully be applied—health, education, income support. Through the Development Impact Evaluation initiative (DIME), for example, the World Bank has established a far-reaching program of impact evaluations, many of them using a randomized-experiment approach. Rather than drawing policy conclusions from one-time experiments, DIME evaluates portfolios of similar programs in multiple countries to allow more robust assessments of what works. In Benin, India, Kenya, Nepal, Nicaragua, and Pakistan, for example, the Bank is supporting tests of a powerful idea: that stronger community control of schools and better community access to information (such as students’ test scores) will improve school performance and learning outcomes. By carrying out the program evaluations together with the agencies running the programs, the Bank is helping to create both the demand for evidence-based policies and the developing countries’ own capacity to generate that evidence.

One measure of the Bank’s commitment to impact evaluations is its successful partnership with MIT’s Abdul Latif Jameel Poverty Action Lab, which Banerjee co-directs. Of the 34 developing-country JPAL projects listed with funding sources, 24 have been funded partly or wholly by the Bank, and in some cases World Bank researchers are conducting the evaluations together with JPAL staff. The JPAL deserves great credit for increasing interest in and expertise on randomized evaluations. The variety of its projects—experiments with textbook provision in Kenya, nutrition for young children in India, business training for micro-entrepreneurs in Peru and the Philippines, and school vouchers in Columbia—is testament to the leadership of Banerjee and his colleagues.

But not everything can be done through randomized evaluation. First, as Banerjee notes, in some cases “randomized experiments are simply not feasible, such as in the case of exchange-rate policy or central-bank independence.” The same is true in many other cases: governments are not likely to agree to randomize reductions in tariff rates, for example, or the geographical placement of power plants. Nor can broad programs of institutional, governmental, or policy reform be randomized.

Second, as with medical trials, randomization will not always be ethical. For example, where we have good reason to believe that a program works, we cannot withhold it from members of vulnerable populations simply to make a clean randomized evaluation possible.

Third, it will never be efficient to move wholly into randomized evaluation, even for well-defined projects. To evaluate earthquake preparedness, it is less costly to go to where an earthquake has just struck than to randomize interventions globally and wait for the next Big One.

Fourth, answers can depend heavily on the cultural and social context in which questions are asked. Governments understandably resist the transfer of a program evaluated in another country, or indeed another part of a country, without adaptation to local circumstances. But it will not be possible to cover all contexts by carrying out an infinite number of randomized evaluations.

Fifth, before experimentation, there must always be a prior decision on which programs to experiment with. If you want to improve education, should you run a careful randomized experiment of the effects of providing textbooks to students, or of giving them deworming medicine, or of hiring an extra teacher, or of paying for their school uniforms? The choice of interventions to test depends on the context, which is why practitioners must invest heavily in collecting baseline data and doing observational studies. Too often, we lack even the basic data needed to develop an experiment—data on the number of villages in a rural area, on health and school attendance before the trial, and so on. Getting basic statistical services up and running is often a costly precondition for effective experimentation.

Sixth, there is the crucial question of scale. If we can act only on detailed project evidence, then no action can be taken at the economy-wide level. Yet we have seen repeatedly—notably in India and China over the past two decades—that economy-wide reforms and actions are the real drivers of change.

Seventh, what about sustainability? Banerjee’s analysis prioritizes cost-benefit calculations from randomized experiments above all other considerations—but those other considerations matter. Take Mexico’s well-known Progresa program, which Banerjee criticizes as an expensive means of increasing primary-school enrollment. This program is successful because it achieves other goals as well, including better health outcomes, higher secondary-school enrollments, and higher investment by poor people. The resulting domestic support has cemented the program’s effectiveness by sustaining it and allowing it to be expanded nationwide.

The history of development aid supports Banerjee’s view that there has been too little detailed microeconomic study of program efficacy. But there is another important lesson of development aid: sustainable progress in developing countries depends on improving the overall capacity of the government to deliver services and foster growth.

Banerjee is proposing, in effect, to “ring-fence” most development aid within the confines of development interventions proven to work by randomized evaluation. However, research has shown that ring-fencing offers illusory protection. Aid is largely fungible: ill-intentioned governments can play financial shell games that undermine donor intentions by shifting their own resources from the donor-targeted sector into other areas (such as weapons purchases). Supporting accountability of public budgets and working with governments to improve the quality of overall public spending is vital, although not amenable to neat experiments. Furthermore, detailed external micromanagement at the project level can undermine local accountability and capacity-building.

Finally, it’s worth taking a step back for perspective. There have been serious mistakes, particularly where aid has been politically driven, as during the Cold War. (Pouring billions into Mobutu’s Zaire, for example, was tragically misguided.) Yet the development progress of the past half century has been remarkable in many ways. The number of people living in extreme poverty (subsisting on less than one dollar per day) fell by 400 million between 1981 and 2001, despite rapid population growth. In 1970 nearly two in four adults in developing countries were illiterate; now it is only one in four. And life expectancy in developing countries has increased by more than 20 years since 1950. Too many countries—especially in sub-Saharan Africa—still lag behind economically, but the last decade or so has seen improvements in governance and the return of growth across much of the continent. And even where economic growth has stagnated, there has often been major progress on some social indicators. Progress is driven primarily by domestic action, but international institutions and bilateral assistance have often promoted the kind of policies that have led to change.

Banerjee is cautiously optimistic about the future, as are we, but we should also be cautiously optimistic about the past. There are reasons to believe that the productivity of aid has risen recently. Donors and developing-country governments alike have learned from economic history and experience: developing-country policies and governance have improved, donors are giving more aid to countries that will use it well and are focusing on poverty, and donors are providing aid through less burdensome methods. This progress must continue; while microanalysis of randomized experiments has an important role to play, it alone won’t get us there. Consider Mozambique, which emerged from civil war in the early 1990s. Making broad macro judgments about prospects for development, donors decided to invest heavily in Mozambique’s reconstruction, and poverty there fell sharply in the 1990s. Had they insisted first on results from randomized experiments, the opportunity might have been lost.

Without the full set of tools for learning and understanding, a narrow insistence on the good science of randomized evaluation could turn into an intellectual straitjacket. We, like Banerjee, will continue to champion randomized evaluations. But policymakers and those who would support them also have to learn from a broad range of experiences and tackle the problems of governance, institutions, and policies at the level of the economy as a whole.