The methods of Rachel Glennerster, Michael Kremer, and many of their colleagues and students constitute one of the most exciting innovations in the study of development. However, their research suffers from serious problems of generalizability, and their methodologies are missing fundamental validity safeguards that hinder their works’ acceptance by the broader development and scientific communities. The growing influence of their approach means that wrong conclusions could have serious consequences. And wrong conclusions are entirely possible: randomized trials are not infallible, as medicine’s sordid history demonstrates. Economics could learn from that experience.

When the stakes are high, subtlety matters greatly. In 2000 the investigators of the U.K. Prospective Diabetes Study (UKPDS) published the results of a landmark randomized trial. They found that good control of blood glucose in patients with type 2 diabetes led to fewer deaths, heart attacks, and amputations and less eye disease. Lower blood glucose, the researchers concluded, leads to better health outcomes in diabetics. This insight motivated diabetes care for a decade, when another trial examined whether further reducing glucose from the levels tested in the UKPDS led to additional health benefits. Instead, the trial found an increase in mortality due to all causes.

Misadventures in overgeneralization have honed the antennae of medical researchers, clinicians, and health consumers, compelling them to attend to the nuance of patient and disease. The result has been a proliferation of trials in ever-clearer populations and with ever-finer implications. Using research outcomes in practice is acceptable in medicine only when the patient-group definition is precise, the disease is similar among patients, and the treatment is equivalent across settings.

This is not true for development field trials: population definitions are fuzzy, comparisons across geography and setting are tenuous, interventions differ with every trial (sometimes even within a trial), and interaction of intervention with environment is ubiquitous. This makes generalization outside the specific setting and intervention a near impossibility.

Glennerster and Kremer argue that common results provide insight into common human behavior. Describing their work as “behavioral economics,” however, is misleading. After all, these trials aim to answer questions about what works in development, not about human behaviors. In fact, the attempts to explore the behavioral mechanisms underlying these trials’ results only expose their methodological weaknesses and threaten their validity.

With generalizability problems and a fledgling theoretical foundation, the onus on the investigators is to follow the strictest scientific procedures for reducing the possibility of spurious results and suspect interpretations. Economics field trials fail in this regard on at least two important fronts: lack of trial registration and profusion of subgroup analysis.

Trial registration is a crucial tool for ensuring sound scientific practices in randomized trials. Its goal is to ensure that scientists report complete and balanced findings, not the selective results that make newspaper headlines. The incentives to produce highly influential research from expensive trials that consume several years’ work are strong but scientifically irrelevant. Even the most upright investigator seeks the holy grail of publication: the significant result. Generating significant findings through small tweaks of the research design and subtle changes to inclusion and exclusion criteria are unsavory practices, but the motivation behind them is unsurprising.

Ex ante trial registration can alleviate many of these problems. Registration of the trial design, types of intervention, and outcomes of interest prior to data collection provides a time capsule of both published and unpublished trials, as well as a signal of the theory motivating a study. For about a decade the National Institutes of Health have maintained a free trial registry to facilitate this important safeguard.

Researchers running development field trials have not taken advantage of trial registration, however, and therefore have missed an opportunity to buttress their findings against justifiable criticisms of fishing for significant results. One recent field trial, published in a prominent medical journal where registration is a condition of publication, was registered a year after the trial’s completion, a suspicious gesture. Without ex ante registration, we cannot know whether the reported outcomes relate to the trial’s original goals or are the product of mining for data that appears serendipitously significant.

The possibility of distortion is increased by a common procedure known as subgroup analysis. In subgroup analysis, subsets of the full population of trial participants are broken out according to differences such as sex and age. The risk of finding effects by chance alone, also known as false positives, rises with the number of subgroups. In one famous instance, a trial of aspirin delivery after a heart attack seemed to reveal that the treatment was effective in all patients except those with the astrological signs Libra and Gemini. Unfortunately, subgroup analysis without prior registration is standard in development field trials.

Medicine has also suffered from publication bias—the tendency of journals to publish only significant results, while ignoring inconclusive or unsurprising results. The suppression of “negative” results generates waste as researchers unwittingly pursue hypotheses already explored. More importantly, the true effect of an intervention is best estimated by a synthesis of positive and negative trials. Without knowing which trials failed to uncover a significant effect, we are unable to infer any sweeping conclusions from randomized trials. Is development economics better protected from this bias than medicine?

I admire Glennerster, Kremer, and the other experimental development economists for trailblazing a new approach to understanding what works (and what does not) in their field. I am also concerned that their approach is too open to skepticism. If the checkered history of out-of-sample generalizations and biased trial conduct in medicine is any indication, their conclusions may lead to harmful policy decisions.

None of this is to suggest that randomized trials cannot be useful in development economics. Past trials have been highly successful in understanding the impact of, for example, Mexico’s Seguro Popular, a nationwide health-care service intended to reach 50 million uninsured. But that trial was used to inform the program rather than provide generalized conclusions. Until the methodological wrinkles of development trials are ironed out, its pioneers may want to narrow the scope of the problems and solutions they address.