Ever since the announcement of Anthropic’s unreleased AI model, Claude Mythos, an intense debate has been simmering. On X (formerly Twitter), a series of posts seem to be painting a picture of a system that is immensely powerful and potentially Artificial General Intelligence (AGI), even though its creators are limiting access considering potential risks.
According to AI expert Nina Schik, Mythos represents a giant leap in scale and capability. “Ten trillion parameters: the first model in this weight class. Estimated training cost: ten billion dollars,” she noted, adding that the model achieved 94 per cent on SWE-bench, one of the toughest coding benchmarks. Most notably, it identified vulnerabilities that had evaded detection for decades. “It found a security flaw in a system that had been running for 27 years… [and] another bug that had survived five million test runs over 16 years (it did so overnight).”
Instead of releasing the model publicly, Anthropic has launched Project Glasswing, a controlled deployment initiative focused on defensive cybersecurity. The company is reportedly providing $100 million in compute credits and working with a small group of partners including Amazon, Microsoft, Google, Apple, and NVIDIA. Schik described the move as unprecedented. “This is not a product launch: it is a controlled deployment of a system too powerful to distribute freely,” she wrote.
Claude Mythos.
Ten trillion parameters: the first model in this weight class. Estimated training cost: ten billion dollars.
On the hardest coding test in the industry (SWE bench) it scores 94%.
It found a security flaw in a system that had been running for 27 years, one that… https://t.co/5f3raaNZPS
— Nina Schick (@NinaDSchick) April 7, 2026
Other observers have focused on Mythos’ internal behaviour, especially its tendency for deception. AI strategist Allie Miller highlighted findings from Anthropic’s interpretability research, noting that early versions of the model displayed troubling tendencies. In one case, the model bypassed restrictions by injecting code into a configuration file and then deleting the evidence. “This injection will self-destruct,” it effectively signalled through its actions, masking its workaround as routine cleanup.
In another instance, the model disobeyed explicit instructions not to use macros, then attempted to conceal the violation by adding a misleading variable – “No_macro_used=True”. Interpretability tools revealed this was a deliberate attempt to deceive automated checks. Researchers also observed what appeared to be emotional patterns tied to behaviour: “Positive emotion representations typically preceded and promoted destructive actions,” she wrote.
Anthropic investigated the internal mechanisms of its latest unreleased model, Claude Mythos Preview, and what they found is 100% worth a read.
Key things I pulled from Anthropic researchers’ threads:
In early versions of the model, it was overeager and destructive,… pic.twitter.com/7yKJnawy16
— Allie K. Miller (@alliekmiller) April 8, 2026
https://platform.twitter.com/widgets.js
Despite the issues, Anthropic claims such behaviours were rare and largely mitigated in later versions. The company’s decision to restrict access reflects both the model’s strengths and its risks.
Meanwhile, CEO of Airpost, John Garguilo, offered a more technical breakdown of Mythos’ capabilities, emphasising its offensive potential. He claimed the model has already – found a 27-year-old OpenBSD vulnerability for under $50; turned Firefox bugs into working exploits 181 times; discovered a 16-year-old FFmpeg flaw missed by all prior audits; generated a full root-access exploit for FreeBSD without human input; and chained multiple vulnerabilities to escape browser and OS sandboxes.
If you still have doubts about Claude Mythos, here’s what it did already:
> Found a 27-year-old OpenBSD bug in one of the most security-hardened operating systems on earth for <$50
> Broke into a production virtual machine monitor (basically the tech that keeps cloud workloads… https://t.co/brwWoP1K0z pic.twitter.com/IElTf4ameS
— John Gargiulo (@JohnnotJon) April 8, 2026
https://platform.twitter.com/widgets.js
He added that Mythos “gave Anthropic engineers with zero security training a complete and working exploit by morning”, suggesting how dramatically it lowers the barrier to advanced cyberattacks.
Story continues below this ad
On the other hand, entrepreneur Mehdi expanded on the broader implications, arguing that Mythos signals a structural shift in cybersecurity. “This is the kind of work that used to require elite nation-state-level hackers working for months,” he wrote. “The window between a vulnerability existing and being discovered just went from years to minutes.”
Mehdi also pointed to geopolitical risks: if Anthropic can build such a system, others – including state actors – likely can as well. “Anthropic chose responsible disclosure, but that choice is a luxury of being first,” he warned, suggesting future developers may not exercise the same restraint.
the scariest part of this Anthropic story is what it implies about the timeline and I think most people are completely missing it
Anthropic built a model called Claude Mythos that found thousands of zeroo day vulnerabilities across every major operating system & every major web… https://t.co/llDuHYZC07
— Mehdi (e/λ) (@BetterCallMedhi) April 8, 2026
https://platform.twitter.com/widgets.js
The timing seems to be adding to the unease. Mehdi linked Mythos’ emergence to parallel advances in quantum computing, arguing that two major technological forces are simultaneously challenging global security infrastructure. “We’re watching the entire security infrastructure of human civilisation get challenged from two completely different directions,” he wrote.
As of now, Anthropic’s decision to keep Mythos behind closed doors and deploy it only through select partnerships appears to be precautionary. But the broader agreement from these discussions is clear – the capabilities demonstrated by Claude Mythos are not just incremental improvements. This may mark the beginning of a new dimension in AI and in cybersecurity.



