Anthropic’s safety warnings backfired. The US government just shut down its most powerful AI

Anthropic’s safety warnings backfired. The US government just shut down its most powerful AI

On June 9, Anthropic launched Claude Fable 5, its first publicly available Mythos-class model, alongside the restricted Claude Mythos 5. Four days later, both models went offline.

The US Commerce Department’s Bureau of Industry and Security (BIS) issued an export control directive on June 12 ordering Anthropic to block all foreign national access to the models, citing national security concerns about a “narrow potential jailbreak” risk. Anthropic responded by shutting down both models globally, telling customers it had no practical way to verify the nationality of every user (TechCrunch; Reuters).

The story of how this happened contains a sharp irony. Anthropic’s approach to safety had long relied on transparency: publishing detailed system cards, openly acknowledging that perfect jailbreak resistance was impossible, and warning that frontier AI capabilities were becoming dangerously capable. That transparency, TechCrunch argued, may have given regulators the factual basis they needed to act.

A rapid unraveling

The timeline moved fast. Fable 5 launched on June 9 with an input classifier that would silently route certain high-risk queries to the weaker Claude Opus 4.8 without telling the user. Within 24 to 48 hours, pseudonymous jailbreak researcher “Pliny the Liberator” publicly claimed to have breached Fable 5’s guardrails using a multi-agent “pack hunt” technique, demonstrating stack exploit generation (TechCrunch).

On June 11, Anthropic apologized for the invisible throttling mechanism and promised to make safeguards visible — a concession that only drew more attention to how powerful the underlying model was. The next day at 5:21 PM ET, the BIS directive arrived.

Anthropic’s formal statement said the company was complying with the legal directive but disagreed that “the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people” (Anthropic). The directive went so far as to bar even Anthropic’s own foreign-national employees from using the models.

Why BIS acted

The Bureau of Industry and Security is the same agency that restricted China’s access to advanced Nvidia chips. Its authority comes from export control law, not AI safety regulation — which means this was an action about technology transfer, not about the risks of AI itself.

The concern appears to have been twofold: Fable 5’s jailbreak vulnerability could allow foreign actors to access capabilities the US considered sensitive, and Mythos 5 (the unrestricted version already deployed to about 200 Project Glasswing partners including Microsoft, Google, JPMorganChase, and Nvidia) represented an even greater risk as an unsafeguarded model in limited but international circulation (Ars Technica).

The backfire thesis

Anthropic has spent years positioning itself as the AI industry’s safety conscience, publishing detailed system cards that other companies keep confidential, and warning that frontier AI posed existential risks. The company argued that radical transparency would build trust and give it a seat at the regulatory table.

Instead, its own system cards became the evidentiary basis for a shutdown. The documentation that Anthropic published to demonstrate its safety responsibility was cited by regulators as proof that the models were dangerous enough to justify export controls. Other models with similar capabilities from less transparent companies remain online.

Anthropic said it is working to restore access to both models. The company’s blog post emphasized that the directive was based on a specific security finding, not a general judgment about the model’s safety. But for a company that bet its reputation on safety transparency being an asset rather than a liability, the experience raises an uncomfortable question for the entire industry: when honesty about your AI’s risks provides the justification for taking it away, what incentive do other companies have to be honest?

Sources: TechCrunch (June 12, 2026); Reuters (June 13, 2026); Anthropic (June 12, 2026); Ars Technica (June 12, 2026); NBC News (June 13, 2026)

Anthropic’s safety warnings backfired. The US government just shut down its most powerful AI

Leave a Comment Cancel Reply