Anthropic's New AI Development: Claude Mythos Details Leak

Anthropic's New AI Development: Claude Mythos Details Leak

Synopsis

A data leak revealed Anthropic is developing “Claude Mythos”, its most powerful AI model yet, now in early testing. Exposed files showed details about the new models and cybersecurity risks that may result from it. The company blamed human error for the data leak.
AP
A data leak has revealed that Anthropic is developing a new artificial intelligence model it claims is its most powerful yet, with the system already being tested by a small group of users.

A report in Fortune quoted an Anthropic spokesperson as saying the system is “the most capable we’ve built to date.”

Big leak

Details about the model emerged after internal material was accidentally exposed in a public data store. In total, 3,000 assets linked to Anthropic’s blog were accessible online. These included early drafts of announcements and other internal content that had not yet been released publicly.

Among the files was a draft blog post referring to the model as “Claude Mythos” and warning that it could pose serious cybersecurity risks.

The same leak also pointed to a planned, invite-only CEO summit in Europe, part of the company’s push to promote its AI systems to large businesses.

The company later said the leak had occurred due to “human error,” specifically in how its content management system (CMS) was set up. It described the material as “early drafts of content considered for publication” and has since restricted access to the data.

A new generation of AI models

The leaked draft also referred to a new category of models under the name “Capybara.” According to the document, this would represent a step beyond the company’s current top-tier models.

“’Capybara’ is a new name for a new tier of model: larger and more intelligent than our Opus models—which were, until now, our most powerful,” Anthropic said in one leaked blogpost.

Capybara and Mythos seem to be referring to the same underlying model, according to Fortune.

Currently, Anthropic offers models at three levels: Opus, Sonnet and Haiku, which vary in size, cost and capability. Opus is the largest and most capable, designed for complex tasks but at a higher cost. Sonnet is a mid-tier option, balancing performance, speed and price. Haiku is the smallest, fastest and cheapest, suited for simpler use cases.

The new system appears to go beyond Opus, making it both more advanced and more expensive. The document also suggested that training for “Claude Mythos” has already been completed.

Cybersecurity concerns

The leaked material highlights growing concern within the company about the risks linked to more advanced AI systems, Fortune said.

“In preparing to release Claude Capybara, we want to act with extra caution and understand the risks it poses—even beyond what we learn in our own testing. In particular, we want to understand the model’s potential near-term risks in the realm of cybersecurity—and share the results to help cyber defenders prepare,” the document said, according to Fortune.

In simple terms, Anthropic believes the model could be used to find and exploit weaknesses in software much faster than current tools. This raises the risk of more frequent and large-scale cyberattacks if such systems fall into the wrong hands.

Because of this, the company plans to release the model carefully, starting with trusted organisations. “We’re releasing it in early access to organisations, giving them a head start in improving the robustness of their codebases against the impending wave of AI-driven exploits,” Anthropic said in the draft blog, according to Fortune.

Real-world misuse already detected

Anthropic has already seen attempts to misuse its AI systems. The company said hacking groups, including some linked to China, have tried to exploit its tools in real-world operations.

In one case, a state-backed group used Claude Code in a coordinated effort targeting around 30 organisations, including technology companies, financial institutions and government bodies. Anthropic said it identified the activity, blocked the accounts involved and informed those affected within days.

The incident underlines the wider challenge facing AI companies: building more powerful systems while trying to limit how they might be misused.