Anthropic details Claude Fable 5’s cybersecurity safeguards and AI jailbreak framework

1 hour ago 1

Anthropic’s newest AI model got grounded by the US government, and the company is now showing its homework on how it plans to keep things locked down going forward.

Following a security incident in June 2026 that led the US Commerce Department to temporarily suspend global access to both Claude Fable 5 and its more powerful sibling Mythos 5, Anthropic has detailed a suite of cybersecurity safeguards and proposed an early framework for assessing how bad AI jailbreaks actually are.

What happened, and what Anthropic built in response

Fable 5 and Mythos 5 launched in early June 2026. Amazon researchers subsequently identified a narrow jailbreak vulnerability in Fable 5, one that allowed limited exploitation of known vulnerabilities. The jailbreak was characterized as non-universal, meaning it wasn’t some skeleton key that unlocked everything.

The US Commerce Department enforced export controls and temporarily yanked global access to both models. Access was restored on July 1, 2026, after Anthropic deployed a new safety classifier that blocks over 99% of attempts replicating the reported jailbreak.

For users whose requests get flagged by this classifier, the system reroutes them to the less capable Claude Opus 4.8 model.

Mythos 5 operates under stricter guardrails by default. The model is restricted to approved partners and offers greater cyber capabilities than Fable 5.

Project Glasswing and the jailbreak severity framework

Anthropic has been working with Amazon, Microsoft, and Google under something called Project Glasswing to propose an industry-wide system for evaluating jailbreak severity. The framework prioritizes mitigations based on two dimensions: how discoverable a jailbreak is, and how much damage it can actually cause.

Anthropic has also been running its HackerOne bug bounty program with a specific focus on cyber jailbreaks targeting Fable 5.

The pricing and access equation

Fable 5 sits at $10 per million input tokens and $50 per million output tokens.

What this means for investors and the broader market

The involvement of Amazon, Microsoft, and Google in Project Glasswing is worth watching closely. When the three largest cloud platforms collaborate on safety standards, those standards tend to become the de facto industry baseline.

The jailbreak severity framework, if adopted broadly, could reshape how AI companies allocate their security resources. Instead of treating every vulnerability report as a five-alarm fire, teams would triage based on structured criteria.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Read Entire Article