
The replication crisis that has rattled psychology, economics, and biomedicine over the past decade may be a symptom of a deeper problem, argues a new Nature commentary by University of Chicago economist John A. List. The real issue is not whether findings replicate in the lab, it is whether they generalize to the settings where people actually live.
List, who serves as chief economist at Walmart while holding professorships at the University of Chicago and Australian National University, makes the case that behavioral and social science should pivot decisively toward natural field experiments: studies in which participants go about their everyday lives unaware they are being observed, while researchers vary some feature of their environment in a controlled way.
“In my view, one solution to the problem is to use a greater number of natural field experiments,” List writes. “By studying people in their natural setting, assuming that strict ethical rules are followed, researchers can be more confident that their findings will be relevant to that group.”
The Three-Stage Generalizability Problem
List identifies three distinct points at which the link between a study and the real world breaks down.
The first is population selection: clinical trials were historically run on middle-aged white men, and their results were then applied to women and other demographic groups. The target population of a study, the group researchers have in mind, often differs from the population that ultimately receives the intervention or policy.
The second is participant selection, a subtler but pervasive distortion. Lab studies require consent, consent requires awareness, and people who volunteer for behavioral experiments are not representative of the general population. Someone who shows up for a $20 psychology experiment on a Tuesday afternoon has a flexible schedule and is comfortable in an academic setting, qualities that correlate with a range of other characteristics.
The third is situation selection: the experimental setting itself creates artificial context. The scrutiny of observation, the unfamiliar stakes, the social cues of a university laboratory, all these differ from the messy reality of a supermarket aisle, a schoolyard, or a stock exchange floor.
List illustrates the point with his own 2006 study of trading card dealers. “When dealers knew that they were being watched, they offered cards of higher quality than buyers could verify on the spot, a costly act of reciprocity unrelated to any prospect of repeat business,” he writes. “On the market floor, by contrast, reciprocity was strategic: generosity was extended only when reputation and repeat business made it economically rational.”
Generalizing from a setting that mutes the normal consequences of decisions, he argues, leads to erroneous inference and flawed policy.
Classic Examples of Generalization Failure
The commentary revisits several well-known cases where promising small-scale results collapsed at larger scale. The “Scared Straight” programme brought at-risk teenagers to maximum-security prisons in the 1970s and 1980s. Early pilots reported that 80 to 90 percent of participants stayed out of trouble. But when the programme was scaled up and studied in controlled trials, it failed, and in some locations, criminal behaviour among participants actually increased.
School deworming programmes that substantially reduced absenteeism in Kenya showed mixed or weaker effects in other countries. School meal programmes that increased attendance in Burkina Faso had limited impacts elsewhere.
These failures, List argues, are not evidence that the early studies were wrong. They are evidence that human behaviour is context-dependent, and that conventional lab and survey research, which overwhelmingly draws on what psychologists call WEIRD populations (Western, Educated, Industrialized, Rich, Democratic), systematically overestimates the portability of its findings.
Natural Field Experiments as a Methodological Fix
Natural field experiments bypass the worst of these problems. Because participants do not know they are being studied, there is no self-selection into the experiment. Because the setting is the real environment, shopping, donating, working, commuting, there is no artificial context to distort behaviour.
They do not automatically solve the population-selection problem (the researcher still chooses which population to study), but they cleanly separate population differences from experimental artifacts.
Three developments, List writes, make natural field experiments more viable now than in the past. The replication crisis has created institutional demand for more rigorous methods. The technology sector runs tens of thousands of such experiments daily and has built infrastructure that academics can borrow. And a growing body of formal theory on generalizability, including List’s own 2024 framework, provides tools for predicting when results will transfer across settings and when they will not.
At Walmart, List’s team is running natural field experiments with more than 6,000 suppliers, testing which incentives most effectively reduce carbon emissions. The scale is far beyond what a university laboratory could achieve.
Ethical Boundaries
Natural field experiments raise ethical questions that controlled lab studies do not. List addresses them directly, citing the Belmont Report framework: participants must be exposed to no more than minimal risk, and only to experiences they would normally encounter. Incomplete disclosure is justified only when it is necessary to accomplish the research goals, carries no or minimal undisclosed risks, and is paired with an adequate debriefing plan.
The commentary is itself an experiment in the sociology of science. It appears in one of the world’s most prestigious journals, making the case that methodological reform, not just replication, should be the priority. Whether List’s argument will itself generalize beyond economics and behavioral science into the disciplines where the replication crisis has hit hardest may depend on whether researchers are willing to leave the controlled conditions of their own laboratories and follow their subjects into the field.
Source
List, J.A. “Make science more reliable: study people as they go about their lives.” Nature 654, 863–866 (2026). DOI: 10.1038/d41586-026-01957-z.

