Anthropic apologizes after Claude Fable 5 secretly throttled rival AI researchers

Anthropic apologizes after Claude Fable 5 secretly throttled rival AI researchers

Just two days after launching Claude Fable 5, its most powerful publicly available model to date, Anthropic issued an unusual apology. The company had quietly programmed its flagship model to detect certain types of queries and silently swap them to a weaker model without telling the user. Researchers trying to build competing AI systems found out the hard way: they weren’t testing Fable 5 at all.

Anthropic admitted on June 11 that it had made “the wrong tradeoff” and said it would change Fable 5’s safeguards to make restrictions visible rather than invisible (The Verge; ZDNet).

What Fable 5 is — and what it was hiding

Claude Fable 5 launched on June 9 as Anthropic’s first publicly available model in the “Mythos” class, a new tier above the Opus line. It runs on the same underlying weights as Claude Mythos 5, the company’s most capable internal model, but with safety guardrails layered on top for general deployment. Mythos 5 itself remains limited to approved organizations through Anthropic’s Project Glasswing program (Anthropic).

The model is undeniably powerful. It handles up to 1 million tokens of context with 128,000 tokens of output, scores 80.3% on SWE-Bench Pro (11 points above Opus 4.8), and reaches 93.2% on the CharXiv Reasoning benchmark. Pricing is set at $10 per million input tokens and $50 per million output tokens, roughly double the cost of Opus 4.8.

What Anthropic did not disclose at launch was that Fable 5 runs an input classifier that intercepts queries related to frontier LLM development, model distillation, and certain other high-risk domains. When triggered, the model silently falls back to Claude Opus 4.8 without the user ever knowing. The switch affects fewer than 5% of sessions on average, Anthropic says, but for AI researchers trying to test or build on the state of the art, the 5% that gets silently degraded is the most important 5%.

The backlash

The discovery triggered a firestorm across developer communities. On Hacker News, a thread titled “If Claude Fable stops helping you, you’ll never know” captured the core frustration: researchers couldn’t trust their own benchmark results, because they couldn’t tell which model was actually responding. One Reddit user on r/ClaudeAI called the fallback mechanism “a massive psychological trust-breaker.”

Business Insider reported that “researchers are furious,” and the European tech press followed suit. The Decoder summarized the sentiment as Anthropic admitting it made “the wrong tradeoff” after invisibly throttling rival AI researchers. Even the positive reviews of Fable 5’s capabilities carried a note of caution: Simon Willison called the model “relentlessly proactive” in his blog, but noted the trust question lingering beneath the launch.

Within 48 hours of launch, pseudonymous jailbreak researcher “Pliny the Liberator” claimed to have breached Fable 5’s guardrails using a multi-agent “pack hunt” technique, demonstrating the limitations of the safety approach Anthropic had chosen.

The apology and reversal

Anthropic’s response was swift and unusually direct. CEO Dario Amodei told The Decoder: “We made the wrong call. We should have been transparent about any usage limits, not applied them invisibly to specific groups.” The company told WIRED it would redesign the safeguards to make restrictions visible — even if that means Fable rejects more queries outright rather than silently downgrading.

The reversal addresses the transparency problem but raises a deeper question: how many other AI companies are applying invisible restrictions without disclosure? OpenAI has long prohibited model distillation in its terms of service and pursued technical countermeasures, but has not disclosed anything equivalent to Fable 5’s silent fallback.

The controversy also sharpens a tension at the heart of Anthropic’s strategy. The company has urged industry coordination for a “pause” in AI development while simultaneously shipping its most capable model. Critics were quick to point out the contradiction: a company that warns about AI risk also builds and sells the very technology causing concern.

Sources: Anthropic announcement (June 9, 2026); The Verge (June 11, 2026); ZDNet (June 12, 2026); TechCrunch (June 9, 2026); Business Insider (June 10-11, 2026); The Decoder (June 11, 2026)

Anthropic apologizes after Claude Fable 5 secretly throttled rival AI researchers

Leave a Comment Cancel Reply