The White House Demands Anthropic Block All Jailbreaks. Security Experts Say That’s Impossible

The White House has told Anthropic it must ensure its most capable AI model cannot be jailbroken before it is allowed back on the market. Security experts across the industry say that standard is technically unachievable for any frontier AI system.

The standoff centers on Claude Fable 5, the model Anthropic launched on June 9 to widespread acclaim as the most capable public AI system ever released. Within days, the Trump administration issued an export control directive that forced Anthropic to take both Fable 5 and its more powerful sibling, Mythos 5, entirely offline for all customers worldwide. The models remain suspended nearly a week later with no restoration date in sight.

The administration’s position, reported by Wired’s Hugo Lowell on June 17, is straightforward: if Anthropic wants to rerelease Fable 5, it must proactively ensure the model’s guardrails cannot be circumvented by anyone, under any circumstances. The NSA has concluded that methods exist to disable Fable 5’s safety constraints, particularly for capabilities related to cybersecurity, chemistry, and biology.

This goes further than the initial export control. The June 12 directive from Commerce Secretary Howard Lutnick ordered Anthropic to suspend access for any foreign national, a requirement the company could not implement selectively and responded to by shutting down both models globally. Now the administration is demanding a technical solution to a problem the industry considers unsolvable.

Why Experts Say It Cannot Be Done

Anthropic itself stated in its June 12 release blog post that “perfect jailbreak resistance is not currently possible for any model provider.” The company noted that every safeguard used in the industry is vulnerable to non-universal jailbreaks, and that universal jailbreaks will likely be discovered eventually.

Security researchers have been making the same point for years. AI model guardrails are stopgap solutions, not permanent barriers. A jailbreak does not require breaking the model’s underlying architecture. It can be accomplished through prompt engineering, multi-agent decomposition, Unicode manipulation, narrative framing, or any number of techniques that exploit how the model processes language rather than its explicit safety training.

The Wired report summarizes expert consensus: the White House’s goal of complete jailbreak prevention is technically unfeasible. Anthropic faces an impossible task: satisfying government demands that security researchers consider unachievable, while maintaining functionality for legitimate enterprise and research use.

The Timeline

The chain of events moved fast. On June 9, Anthropic launched Fable 5 with what it described as defense-in-depth guardrails. On June 10, a researcher known as “Pliny the Liberator” demonstrated a jailbreak using multi-agent decomposition and Unicode tricks that allegedly unlocked stack buffer overflow exploits.

By June 11, Amazon CEO Andy Jassy was calling Treasury Secretary Scott Bessent to share Amazon researchers’ findings. At least five other companies also contacted the administration. On June 12, after hours of attempts to get Anthropic to voluntarily patch or pull the model, Lutnick sent the export control directive. Anthropic complied within hours, disabling both Fable 5 and Mythos 5 for all users.

David Sacks, co-chair of the President’s Council of Advisors on Science and Technology, posted on X that the administration “asked Dario to fix the jailbreak or de-deploy the model. Dario refused.” He added that the administration wants to lift export controls “as soon as possible” once Anthropic remediates the issue. “The ball is in Anthropic’s court,” Sacks wrote.

High-level talks in Washington on June 15 between Anthropic co-founder Tom Brown and White House officials failed to produce a deal. As of June 17, the models remain offline.

Anthropic’s Defense

Anthropic argues that the jailbreak findings were narrow and did not justify recalling a commercial product deployed to hundreds of millions of users. In its formal statement, the company wrote: “The potential jailbreaks that have been disclosed to us are either entirely benign responses or are minor findings that provide no Mythos-specific uplift.”

The company further noted that the level of capability unlocked by the reported jailbreak “is widely available from other models (including OpenAI’s GPT-5.5) and is used every day by the defenders who keep systems safe.” Anthropic warned that if the administration’s standard were applied across the industry, it “would essentially halt all new model deployments for all frontier model providers.”

The Paradox

The situation creates a paradox with no obvious resolution. The White House is demanding technical guarantees that the world’s best AI safety researchers say are impossible to provide. Anthropic cannot promise what no company has ever delivered. Yet the administration cannot back down without appearing to accept that highly capable models can be jailbroken at will.

Kirsten Davies, the Pentagon’s Chief Information Officer, framed it in stark terms: “Some things are simply more important than revenue cycles, clickbait, and pre-IPO valuation. America First. Always.”

Anthropic’s refund deadline for subscribers who paid between June 9 and June 14 is June 20. With no deal reached and a deadline approaching, the standoff between AI safety idealism and the technical reality of imperfect guardrails is about to produce concrete financial consequences for the company and its customers.

Sources: Wired (June 17, 2026); TechCrunch (June 12, 2026); Anthropic official statement (June 12, 2026); CyberScoop (June 2026)

Why Experts Say It Cannot Be Done

The Timeline

Anthropic’s Defense

The Paradox

Leave a Comment Cancel Reply