It is slightly terrifying to get responses from such a distinguished group, so I was rather gratified to see such broad agreement on the idea of lazy thinking. A notable exception was Carlos Barbery’s response. Barbery is from the world of aid givers—he was a development banker for 25 years—and he feels that I am not sufficiently respectful, in addition to being wrong. He starts by taking me to task for not appreciating that in the middle of a crisis like the earthquake in Pakistan, it makes sense for people to fail to fill out a form. Despite the fact that the information on the forms could be very useful (the initiative, now called RISEPAK, just received the prestigious Stockholm Prize for its humanitarian contributions). Despite the fact that the economists were from the World Bank. Despite the fact that filling out the forms really did not take much time. (Many smaller NGOs did eventually see the logic of filling out the forms, though the bigger donors, from Barbery’s world, stood aloof.)

Barbery is also unsympathetic to the example of the “successful” non-working computers that I culled from a World Bank sourcebook. As he explains, “The purpose of the book was not to analyze alternative projects that might have had better outcomes . . . but rather to present a selection of projects that have helped to achieve greater empowerment at the local level.” Helped to achieve greater empowerment? Through non-working computers?

The other comments brought out the complexities of the issues I was wrestling with and exposed instances of, dare I say, lazy writing—all the places where I had thought of adding a few more lines and either forgot or thought that no one would note the difference.

I should have been more clear in particular about the role of randomized evaluations in my vision of how aid could be made more effective. In hindsight, it is easy to see why everyone came away with the impression—stated with particular force by Howard White—that in my ideal world, all aid would be allocated based on evidence from randomized trials. This is not what I had in mind when I argued that we could spend a lot of aid money on programs that have already been subjected to randomized evaluations. My point was that we are now in a position to base a lot of our decisions on what I have been calling hard evidence—evidence from high-quality randomized experiments and quasi-experiments—if this is what we want to do. That was not true ten years ago.

This is not to say that we have to base all or even most of our decisions on this kind of evidence. There are obviously other forms of knowledge that are both useful and useable. We know that an exchange rate is overvalued when no one wants to buy the country’s products and the treasury is busy buying up its own currency. We also know, based on simple economics and past experience, that a devaluation of the currency will make food more expensive if the country imports food, and that this would hurt fixed-income groups such as pensioners. And perhaps a very good use of aid would be to ease the transition, making sure that the pensioners do not end up starving.

In my ideal world, all judgments about aid would be based on a judicious balancing of every kind of evidence, weighted appropriately by the credibility of the methodology, which is more or less what Ian Goldin, F. Halsey Rogers, and Nicholas Stern seem to advocate. But who would do all this judicious balancing? The point of my essay, after all, was that the community of aid giving (and using) has shown no great empathy for evidence: Ruth Levine helps to explain why. I share her general optimism about the possibility of overcoming the obstacles, though my sense is that even highly intelligent and entirely well-meaning people often have trouble interpreting highly complex pieces of evidence. How else can one explain the fact that Goldin, Rogers, and Stern believe that donors should get credit for the dramatic reduction in poverty between 1981 and 2001, whereas my sense is that this was driven largely by events in India and China, where donors had very little impact. But so many things changed in these countries all at once that isolating any single causal factor is nearly impossible, and we can continue to disagree about who deserves the credit.

This is why I am inclined to favor interventions where the evidence is simple to interpret. The beauty of randomized evaluations is that the results are what they are: we compare the outcome in the treatment with the outcome in the control group, see whether they are different, and if so by how much. Interpreting quasi-experiments sometimes requires statistical legerdemain, which makes them less attractive, but at least there are more or less widely shared standards for what constitutes a good quasi-experiment. There are also cases where the theory seems straightforward enough that we can probably trust it to give the right answers—for example, as far as I know no one is against uniform accounting standards or transparent procedures for exports and imports.

As Jagdish Bhagwati and Alice Amsden point out, this does bias one against macro policies such as free trade and industrial policy. This is not the place to debate the relative merits of these interventions (Bhagwati and Amsden would presumably be on opposite sides) or the methodology of how best to analyze these questions. However, if we leave out the more egregious examples of macro absurdity, such as Indian trade policy in the 1970s or the Great Leap Forward, I am probably willing to live with this bias. Whether I like it or not, governments will continue to make macro policies. The hope is that by setting the benchmark at policy based on hard evidence, policymakers will be forced to examine their rationales more closely.

Obviously, as Angus Deaton points out, none of this would stop someone who was really determined to steal. If the evidence suggests that a road should be built from A to B, he will be for building it, and then he will find a way to make money from it. On the other hand, at least then there will be a road between A and B, albeit one that cost more than it should have—while so many other development projects look like roads to nowhere.

Nor will aid work, as Ian Vásquez points out, unless donors have some interest in making an impact rather than grand gestures or political posturing. This is where I do see things changing, if only because the aid establishment is under such attack. Donors must fear that they will not survive unless they show some results.

The second thing I should have emphasized more is the cost of insisting on hard evidence. Goldin, Rogers, and Stern outline a number of the standard objections to randomized experiments. The two most important reflect the fact that there is no such thing as purely empirical knowledge. There are theories buried in our choice of the particular interventions that we evaluated, and theories that we use, implicitly or otherwise, to generalize from a few localized experiments to the rest of the world. I do not doubt that those theories will occasionally fail us, but they have the advantage of being simple—the similarity of education in India and Bangladesh, for example—and if we so wanted, we could reduce our dependence on theories by running more experiments. To this I would add the problem of how to deal with interventions that differ widely depending on whether they are implemented on a small scale or a large scale: the impact of sending a small number of people from each village to college cannot tell us much about the impact of sending everyone to college because the returns of a college education would presumably be affected if everyone went. This is only a problem with certain types of interventions (there is no such problem with immunizing more children or planting more trees, for example), but where it comes up there is no way to deal with it without invoking some non-experimental knowledge.

I am less convinced by their other objections. The ethical issue is potentially important, especially if the experiment required delaying the delivery of vital resources or services. One certainly needs to be sensitive to it. But for the most part, experiments bring in additional resources (because the experiment is expected to generate useful knowledge) or take advantage of an intervention’s limited scope. I also do not see why they believe that “If we can only act on detailed project evidence, then no action can be taken at the economy-wide level.” After all, it is detailed project data on deworming that eventually leads to an economy-wide action of deworming every child. What am I missing?

Finally, I am baffled by their objection that in situations where the best initiative is not clear, randomized experiments and the necessary collection of data beforehand take too much time. I think such situations are not uncommon and they do take time. But what is the alternative? Remaining ignorant? Shooting blind?

As I see it, two other potential problems with the experimental approach deserve a comment. One is that it biases us in favor of easily measured outcomes: I find Mick Moore’s comment very perceptive except where he implies—as do Raymond Offenheiser and Didier Jacobs—that things like empowerment and popular participation are not measurable. I agree that there are sometimes good reasons to focus on these factors, but, as some of the past work of MIT’s Abdul Latif Jameel Poverty Action Lab demonstrates, there are ways to measure them. However, it is also clear that the scope of the experimental approach will ultimately be limited: as we make the outcome more complex, it will be harder to measure accurately on a large enough scale.

Second, as Robert Bates rightly points out, there is some tension between the idea of international best practice and the rhetoric of countries owning their development process. My sense is that this is less a real problem (countries still have many choices, after all) than a political problem. Our response should be to redefine politically the meaning of country ownership, not to give up on international best practice.

Finally, I should have said more about what is probably the best argument for the experimental approach: it spurs innovation by making it easy to see what works. I was very much taken by Bhagwati’s idea of a Gray Peace Corps as a way of dealing Africa’s skill shortage. In the old days we would have spent hours discussing its merits based on general principles. Now I want to try it out.