
In April 2026, the New England Journal of Medicine, one of the most prestigious medical journals in the world, retracted a case study. The article, in the journal’s “Images in Clinical Medicine” section, described an 87-year-old man who coughed up bronchial casts after exposure to a forest fire. The image showed black, branching airway casts next to a measuring tape.
The problem was the measuring tape. An anonymous commenter on PubPeer noticed that the ruler markings between the 30- and 40-centimeter marks read “1, 3, ?, 4, ?”, a classic AI-generated artifact. The authors admitted they had used AI to move the ruler to the top of the image, saying they had been unaware of journal policies on image manipulation.
It was the first retraction for the NEJM since the 2020 Surgisphere scandal.
The NEJM case is a high-profile symptom of a much larger problem, according to Nan Li, an associate professor of science communication at the University of Wisconsin-Madison. In an article for The Conversation, republished by Live Science, Li argues that AI-generated and AI-manipulated images are entering the peer-reviewed literature across fields, and that the systems designed to catch them are already falling behind.
A problem with depth
The most visible cases are obvious, images that would be laughable to a human reviewer. In 2024, Frontiers in Cell and Developmental Biology retracted a paper that included what appeared to be a Midjourney-generated rat with massively disproportionate genitals and nonsensical labels like “iollotte sserotgomar cell” and “dck.” One reviewer had flagged the image before publication; their concerns were overridden.
But most cases are subtler. A growing body of evidence shows that AI-generated and AI-modified images are slipping through the review process in less obvious ways. The problem spans fields from materials science to medicine. Researchers at ETH Zurich published a paper on arXiv titled “The Unwinnable Arms Race of AI Image Detection” (arXiv:2509.21135), formally demonstrating that as generator capability improves, detection accuracy follows a U-shaped curve, getting easier for a time, then harder again as the two systems converge.
“Systems designed to detect fake images will almost always lag behind systems designed to create them,” Li writes.
The detection gap
Current state-of-the-art AI image detectors achieve 70 to 90 percent accuracy on known generators. On next-generation models, that drops to 50 to 60 percent, essentially random.
The gap matters because detection is only one half of the solution. A broader response is emerging around provenance, cryptographically signed metadata that travels with an image from its point of creation. The leading standard is C2PA (Coalition for Content Provenance and Authenticity), supported by Adobe, Microsoft, Google, OpenAI, and camera manufacturers including Leica, Nikon, and Canon. OpenAI now attaches both C2PA metadata and Google’s SynthID invisible watermarking to images generated by ChatGPT.
But C2PA has a fundamental weakness: stripping the metadata, through screenshots, re-uploads, or format conversion, removes the provenance chain entirely. The standard certifies that a digital file has not been tampered with, but cannot certify that the scene depicted is real.
Publishers respond
Major journals have begun updating their policies in response. Springer Nature has banned generative AI images from publications except for narrow exceptions, and requires disclosure of any AI use in the manuscript. Elsevier’s updated policy, released June 2026, prohibits AI from creating or altering primary research images, including microscopy, histology, Western blots, and radiology scans, and requires detailed disclosure of any AI tools used.
The Science family of journals, under Editor-in-Chief Holden Thorp, has taken the most aggressive stance, classifying AI violations as scientific misconduct. In a January 2026 editorial titled “Resisting AI slop,” Thorp wrote that reviewers must not upload manuscripts to AI tools, and that any AI use must be disclosed in the cover letter, Methods, and Acknowledgments.
arXiv, the preprint repository, announced in May 2026 that it would impose one-year bans on authors who submit papers with “incontrovertible evidence” of unchecked AI generation, such as hallucinated references or meta-comments from LLMs left in place.
The scale of the problem
The numbers are sobering. Approximately one in eight biomedical papers now contains AI-generated text, according to a preprint study from January 2026. A survey of 6,957 submissions to the journal Organization Science found a 42 percent surge since the release of ChatGPT, with more than 50 percent of manuscripts showing AI involvement by early 2026.
The NIST GenAI Challenges, a formal evaluation program, have teams competing as generators, prompters, and discriminators, reflecting the arms-race dynamic. NIST’s AI 100-4 framework, published in April 2024, covers digital watermarking, metadata provenance, and synthetic content detection, but the agency acknowledges the field is moving faster than standards can be developed.
What the crisis means
The infiltration of AI-generated images threatens something fundamental about scientific publishing: the assumption that published images are honest representations of what was observed. Without that assumption, journals must screen images the way they screen text, a massive undertaking that most publishers are not equipped for.
“Without standards,” Li writes, “science risks entering a world where every image can be questioned and no image carries inherent credibility.”
The question is not whether AI-generated images will continue to enter the scientific literature. That war is already lost. The question is whether the scientific community can build a system that makes it possible to tell the difference between a real image and a generated one, before the distinction itself becomes impossible.
Source: Live Science and The Conversation, by Nan Li (University of Wisconsin-Madison). Additional reporting from Retraction Watch, Nature Communications, and arXiv.

