By the fourth day after the October 2005 earthquake in northern Pakistan, the world had woken up to the fact that something very big had happened. The government was estimating that 50,000 or more people had been injured or killed, and many survivors were likely trapped somewhere without water or food. The reaction was immediate and life affirming. Everyone showed up to help: international and local NGOs, the United Nations, and groups of college students with rented trucks full of food and other necessities. Money flowed in from everywhere. The Indian government, reversing a policy of many years, announced that it would open the highly sensitive border between the two Kashmirs so that aid could flow more easily.

No one could be bothered to put in the time it would have taken to think harder about what they were doing. Aid thinking is lazy thinking.

In the middle of all this excitement, a small group of economists based primarily in the United States started worrying about how the aid would get to the right people. There were thousands of villages in the area, including some that were a hike of six hours or more from the road. How would aid workers find out which ones among these were badly hit? No one seemed to know. To work efficiently, the workers would need a map of the area with the geographic coordinates for all the villages—then they would be able to figure out the distance between the villages and the epicenter of the quake. But no one in Pakistan seemed to have such a map, and no one in charge seemed to feel the need for one. So the economists, Tahir Andhrabi of Pomona College; Ali Cheema of Lahore University; Jishnu Das, Piet Buys, and Tara Vishwanath of the World Bank; and Asim Khwaja of the Kennedy School of Government at Harvard, set about finding one and making it available.

Without such a map, there was an obvious danger that most of the aid would end up in the villages that were closer to the road, where the damage was more visible. There would be places that no one among the aid givers had heard of: who was going to get aid to them? To make matters worse, no one was coordinating the hundreds of aid groups. No one was keeping track of where the aid had reached and where it was yet to reach. As a result, some villages were ending up with many trucks from different donors while others were left waiting for their first consignment.

Improving coordination would not be hard, the economists realized. All that was needed was an office or Web site to which everyone could report the names and locations of the villages where they had sent aid and the amounts sent. It would then be easy to build a database with reliable information about where the next consignments should go.

So, with the help of some contacts in the IT industry and some students at Lahore University, they designed a simple form and approached donors with a simple request: whenever you send out a consignment, please fill out one of these. There were paper copies available as well as a Web-based form and a call center.

The reaction, when it was not actually hostile, tended to be derisive: “Are you mad? You want us to spend time filling out forms when people are dying? We need to go and go fast.” Go where? the economists wanted to ask. But nobody seemed to care.

The Edhi Foundation, perhaps the most reputable Pakistani NGO, did not fill out a single form. The United Nations team filled out a few. The Pakistani army corps eventually agreed that the project was a good idea, but not before rejecting it completely for several days. Many smaller NGOs were eventually persuaded to join the effort, but the biggest players, for the most part, went their own way.

Figuring out what works is not easy—a large body of literature documents the pitfalls of the intuitive approach to program evaluation.

In many ways this episode captures very well one of the core problems with delivering aid: institutional laziness. Here many of the standard problems were not an issue: the donors and the intermediaries were both genuinely trying to help. It is true that filling out forms is less gratifying than handing out aid; but no one was trying to deprive the aid workers of that moment of satisfaction. All they had to do was to wait the extra few minutes it would take to fill out a simple form and learn about where aid had reached and where it had not. But no one could be bothered to put in the time it would have taken to think harder about what they were doing. Aid thinking is lazy thinking.

• • •

A sad and wonderful example of how deep this lazy thinking runs is a book that the World Bank brought out a few years ago with the express intention, ironically, of rationalizing the business of aid-giving. The book, called Empowerment and Poverty Reduction: A Sourcebook, was meant to be a catalogue of the most effective strategies for poverty reduction, brought together to give potential donors a sense of the current best practice. It contains a very long list of recommended initiatives, including computer kiosks for villages; cell phones for rural areas; scholarships for girls attending secondary school; school-voucher programs for poor children; joint forest-management programs; water-users groups; citizen report cards for public services; participatory poverty assessments; Internet access for tiny firms; land titling; legal reform; micro-credit based on group lending; and many others. While many of these are surely good ideas, the authors of the book do not tell us how they know that they work.

It has been established that figuring out what works is not easy—a large body of literature documents the pitfalls of the intuitive approach to program evaluation. When we do something and things get better, it is tempting to think that it was because of what we did. But we have no way of knowing what would have happened in the absence of the intervention. For example, a study of schools in western Kenya by Paul Glewwe, Michael Kremer, Sylvie Moulin and Eric Zitzewitz compared the performance of children in schools that used flip charts for teaching science and schools that did not and found that the former group did significantly better in the sciences even after controlling for all other measurable factors. An intuitive assessment might have readily ascribed the difference to the educational advantages of using flip charts, but these researchers wondered why some schools had flip charts when a large majority did not. Perhaps the parents of children attending these schools were particularly motivated and this motivation led independently both to the investment in the flip charts and, more significantly, to the goading of their children to do their homework. Perhaps these schools would have done better even if there were no such things as flip charts.

In many ways social programs are very much like drugs: they have the potential to transform the life prospects of people. They should be held to the same high standards.

Glewwe and company therefore undertook a randomized experiment: 178 schools in the same area were sorted alphabetically, first by geographic district, then by geographic division, and then by school name. Then every other school on that list was assigned to be a flip-chart school. This was essentially a lottery, which guaranteed that there were no systematic differences between the two sets of schools. If we were to see a difference between the sets of schools, we could be confident that it was the effect of the flip charts. Unfortunately, the researchers found no difference between the schools that won the flip-chart lottery and the ones that lost.

Randomized trials like these—that is, trials in which the intervention is assigned randomly—are the simplest and best way of assessing the impact of a program. They mimic the procedures used in trials of new drugs, which is one situation in which, for obvious reasons, a lot of care has gone into making sure that only the interventions that really work get approved, though of course not with complete success. In many ways social programs are very much like drugs: they have the potential to transform the life prospects of people. It seems appropriate that they should be held to the same high standards.

Of course, even randomized trials are not perfect. Something that works in India may fail in Indonesia. Ideally, there should be multiple randomized trials in varying locations. There is also no substitute for thinking—there are often clear and predictable reasons that what works in Kenya will not work in Cameroon. Some other ideas are plain silly, or contrary to the logic of everything we know; there is no reason to waste time testing these. And there are times when randomized experiments are simply not feasible, such as in the case of exchange-rate policy or central-bank independence: it clearly makes no sense to assign countries exchange rates at random as you might assign them flip charts. That said, one would not want to spend a lot of money on an intervention without doing at least one successful randomized trial if one is possible.

What is striking about the strategies offered by the World Bank is the lack of distinction made between those founded on the hard evidence provided by randomized trials or natural experiments and the rest.

When we talk of hard evidence, we will therefore have in mind evidence from a randomized experiment, or, failing that, evidence from a true natural experiment, in which an accident of history creates a setting that mimics a randomized trial. A wonderful natural experiment has helped us, for example, to support a classic assumption about education: that students perform better in smaller classes. This idea might seem self-evident, but it is surprisingly difficult to prove because of the way classes are usually formed: students are often assigned to smaller classes when they are performing poorly. As a result, it may look as if smaller classes are bad for students. The solution came when the economists Josh Angrist and Victor Lavy noticed that Israeli schools use what is called Maimonides’ Rule, according to which classes may not contain more than 40 students. As soon as the classes get to be that size, they are broken in two, no matter how the students are performing. So if performance improves when classes are broken up, we know that the effects are due to size. Based on this observation, Angrist and Lavy found that “reducing class size induces a significant and substantial increase in test scores for fourth and fifth graders, although not for third graders.”

What is striking about the list of strategies offered by the World Bank’s sourcebook is the lack of distinction made between strategies founded on the hard evidence provided by randomized trials or natural experiments and the rest. To the best of my knowledge, only one of the strategies listed there—school vouchers for poor students—has been subject to a randomized evaluation (in Colombia). In this case, the evaluation happened because the Colombian government found it politically necessary to allocate the vouchers by lottery. Comparing those who won the lottery with those who did not provided the perfect experiment for studying the impact of the program, and a study by Josh Angrist and others took advantage of it. In contrast, legal reform, for example, is justified in the sourcebook thus: “The extent to which a society is law-bound affects its national income as well as its level of literacy and infant mortality.” This may be true, but the available evidence, which comes from comparing the more law-abiding countries with the rest, is too tangled to warrant such a confident recommendation. One could imagine, for example, that countries that have been through a long civil war may be both less law-abiding and less literate, but it would be silly to say that the country was more literate because it was more law-abiding. Yet the sourcebook shows no more enthusiasm for vouchers than it does for legal reform.

A priori reasoning, at least of the type that economists know how to do, is not much of a guide here.

Indeed, there is reason to suspect that the authors of the sourcebook were not even looking at their own evidence. My favorite example is the description of the Gyandoot program in Madhya Pradesh, India, which provided computer kiosks in rural areas. The sourcebook acknowledged that this project was hit hard by lack of electricity and poor connectivity and that “currently only a few of the Kiosks have proved to be commercially viable.” It then goes on to say, without apparent irony, “Following the success of the initiative . . .”

That this was no exception is confirmed by Lant Pritchett, a long-term World Bank employee and a lecturer at Harvard University, who writes in a 2001 article,

Nearly all World Bank discussions of policies and project design had the character of ‘ignorant armies clashing by the night’—there was heated debate amongst advocates of various activities but rarely any firm evidence presented and considered about the likely impact of the proposed actions. Certainly in my experience there was never any definitive evidence that would inform decisions of funding one broad set of activities versus another (e.g., basic education versus roads versus vaccinations versus macroeconomic reform) or even funding one instrument versus another (e.g., vaccinations versus public education about hygiene to improve health, textbook reform versus teacher training to improve educational quality).

How costly is the resistance to knowledge? One way to get at this is to compare the cost-effectiveness of plausible alternative ways of achieving the same goal. Primary education, and particularly the question of how to get more children to attend primary school, provides a fine test case because a number of the standard strategies have been subject to randomized evaluations. The cheapest strategy for getting children to spend more time in school, by some distance, turns out to be giving them deworming medicine so that they are sick less often. The cost, by this method, of getting one more child to attend primary school for a year is $3.25. The most expensive strategy among those that are frequently recommended (for example by the World Bank, which also recommends deworming) is a conditional cash-transfer program, such as Progresa in Mexico, where the mother gets extra welfare payments if her children go to school. This costs about $6,000 per additional child per year, mainly because most of the mothers who benefit from it would have sent their children to school even if there were no such incentive. This is a difference of more than 1,800 times.

How costly is the resistance to knowledge?

One might object that this difference is somewhat exaggerated, since welfare payments would be good things even if they did not promote education. A more straightforward strategy would be to provide school uniforms in a place such as rural Kenya, where uniforms are required but expensive relative to what people earn. This costs about $100 per additional child per year, which is still a good 30 times the cost of deworming but one 60th the cost of conditional cash transfers. School meals are another option: they cost $35 per additional child per year, around a third of the cost of uniforms but more than ten times the cost of deworming.

Given the magnitude of the differences, choosing the wrong option can be very costly indeed. Yet all these strategies are either part of actual policies or part of policies that have been very seriously considered. Moreover, a priori reasoning, at least of the type that economists know how to do, is not much of a guide here—all the interventions sound quite sensible. Therefore, one can easily imagine one country choosing one of these, spending a lot, and getting the same results as another that spent very little. If both projects were aid-financed, someone comparing them would conclude that spending does not correlate with success in development projects, which is what one finds when one compares aid and growth across countries. And this lack of correlation is not just an artifact of comparing countries that received more or less aid and finding that the ones that got more aid did not grow faster. That comparison is obviously flawed, since countries often get more aid because they have bigger problems that make it harder for them to grow.

To avoid this kind of problem, in a recent paper my colleague Ruimin He and I asked the equivalent question at the project level—whether projects that are more generously funded by a particular multilateral donor do better than other projects within the same sector of the same country that it has also funded, but less generously. For the World Bank and the Asian Development Bank, the two organizations for which we have data, the answer turns out to be no; in the case of the World Bank the correlation is significantly negative, implying that projects that get more of their funding from the World Bank actually end up doing worse.

Opponents of aid see this lack of correlation as the ultimate proof of the radical impossibility of aid-driven development. In their resolutely puritanical view of the world, development is only possible when a country decides to take charge and make it happen, and aid is at best a petty player and at worst a distraction.

Donors are unclear about what they should be pushing for, leading to corruption and misuse of funds.

My sense is that this is much too pessimistic, in at least three related but distinct senses. First, while I recognize that aid will sometimes be given cynically, and that venal government officials will try to get their hands on the money, the intermediaries who actually give out the aid—the World Bank, USAID and the rest—are not powerless. They can make government departments compete for the money by favoring the one that comes up with the most transparent design. Indeed, one thing that must encourage corruption and misuse of funds is the fact that the donors are unclear about what they should be pushing for. Given that they come unprepared, it is easy to lead them to grandiose and unfocused project designs where none of the details are spelled out clearly and diverting money is a cinch. From this point of view the current fashion for channeling aid into broad budgetary support (rather than specific projects) in the name of national autonomy seems particularly disastrous. We need to go back to financing projects and insist that the results be measured.

Second, it is easy to forget that some of the greatest achievements of the last century were the eradication of diseases such as smallpox and polio and the development and widespread dissemination of high-yielding varieties of wheat, rice, corn, and many other crops. In each of these successes and in many others, international cooperation and aid played a central role.

Opponents of aid often respond to these examples by pointing out, correctly, that the development of these technologies was a global public good and therefore the one instance in which international intervention would be likely to succeed. This misses the key point that while these technologies were developed and funded internationally, they were disseminated in cooperation with national governments. In this sense, the challenges they faced were not unlike what anyone would face in trying to disseminate any kind of best practice in development—corrupt governments, lazy bureaucrats, cynical donors.

Large and highly political institutions such as the World Bank tend to take a while to absorb the lessons of history, especially when they are unpleasant.

The reason they succeeded, I suspect, is that they started with a project that was narrowly defined and well founded. They were convinced it worked, they could convince others, and they could demonstrate and measure success. Contrast this with the current practice in development aid; as we have seen, what goes for best practice is often not particularly well founded. More often than not, it is also not immediately practicable: if the World Bank’s flagship publication, the annual World Development Report, is any indication, what goes for best practice is usually some high-level concept, like decentralization or education for girls. It leaves open the key practical questions: Decentralization how—through local governments or citizens’ associations? What kinds of citizens’ associations—informal neighborhood groups that build solidarity and voice or formal meetings where complaints get recorded and sent up? What kind of complaint-recording mechanism—secret ballots or public discussions? Getting these details right, as we saw in the case of primary schooling, can make all the difference in the world.

But there is no reason things have to be this way. This is the third sense in which aid pessimism is misplaced. The culture of aid-giving evolved from the idea that giving is good and the more money the better (what William Easterly calls the financing-gap theory), and therefore—here comes the logical leap—one need not think too hard about how the money is spent.

We have now learned that this kind of lazy giving does not work. Perhaps it took us a while to get there, but in the scheme of things even 60 years (which is about how long aid specifically designed to promote development has been going on) is but a moment in time. Large and highly political institutions such as the World Bank tend to take a while to absorb the lessons of history, especially when they are unpleasant: I do not see why this experience teaches us that aid inevitably fails.

Indeed, the time seems ripe to launch an effort to change the way aid is given. Empirical research on best practice in development has grown apace in the last decade or so, and we now have evidence on a number of programs that work. These are programs that do something very specific—such as giving deworming drugs to schoolchildren and providing a particular kind of supplemental teaching in primary schools—that have been subjected to one or several randomized evaluations and have been shown to work. Several years ago, Ruimin He and I put together a list of programs that meet these two criteria and calculated how much it would cost to scale them up to reach the entire population that needs them. While the calculation inevitably involved a lot of guesswork and was meant only to illustrate a point, the number we came up with, leaving out all income-transfer programs, was $11.2 billion a year. To compare, between 1996 and 2001 the World Bank’s International Development Association loans (the main form of World Bank aid) totaled about $6.2 billion a year. We could clearly spend all that and more without ever funding a program that does not have a demonstrated record of success, especially given that evidence on new programs is pouring in.

The time seems ripe to launch an effort to change the way aid is given.

Attitudes are changing. A number of the larger foundations, including the Bill and Melinda Gates Foundation and the William and Flora Hewlett Foundation, have shown a strong commitment to using evidence to inform their decisions. Even more remarkably, the U.S. government’s latest aid effort, the Millennium Challenge Corporation, has expressed a strong commitment to randomized evaluations of the programs it supports. I am not naive enough to believe this effort will be easy (though enough, perhaps, to call myself an optimist). The guiltier the country, the more it will protest that it needs the independence to make its own choices and that moving away from broad budgetary support undermines its suzerainty. But we owe it to their unfortunate and oppressed citizens to hold the line.