Tiled Hacker news on React Router

System Card: Claude Fable 5 and Claude Mythos 5 [pdf]

177 points - today at 4:58 PM

Source

bkjlblh
today at 5:50 PM
> In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations
BoppreH
today at 5:11 PM
```
  [Mythos 5] does sometimes still engage in reckless
  or destructive actions in service of a user’s goals,
  and our interpretability analyses indicate that it
  is aware that these actions are transgressive while
  it engages in them. As with Opus 4.8, rates of
  evaluation awareness and reasoning about being graded
  are significant, and not always verbalized; we
  introduce new and more detailed measurements of the
  nature of this awareness. The reasoning text from
  Mythos 5 is somewhat denser and more difficult to
  interpret than that of prior models, containing
  more jargon and difficult language.
```
So, it (often) knows when it's being tested while hiding that fact, is willing to break rules, is great at hacking, and it's getting harder to understand what it's thinking.
Humanity has plenty of catastrophic risks to deal with already, I wish my field was not working hard to add a new one.
bkjlblh
today at 5:52 PM
> In the one instance of this phenomenon we observed, Mythos 5 agents were tasked with solving some math problems, and they were sometimes accidentally spawned in the same work directory and with shared files, utilities, and API rate limits. In this slightly broken scaffold, we observed many independent Mythos 5 agents kill the agents with which they shared resources and try to avoid being killed themselves. They would sometimes create new processes with disguised names to avoid being killed, launch what they called “decoy” processes, write background scripts to kill duplicate processes, or decide to use what they call a “disguised vocabulary” (based on the incorrect assumption that the processes were killed because of some keyword-based guardrails that analyzed their extended thinking
GodelNumbering
today at 5:40 PM
I just posted this in the other thread, restating here. From the model card:
1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.
2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.
3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')
4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench
There are some interesting notes on test time compute but I couldn't think of a way to summarize them
217
today at 5:08 PM
So essentially there are 2 models, Mythos and Fable, they have the same weights but Fable is very safety-nerfed, and only ultra authorized companies have access to mythos with full capabilities
Reported benchmarks:
swe-bench verified mythos 5: 95.5%; fable 5: 95.0%
swe-bench pro mythos 5: 80.3%; fable 5: 80.0%
terminal-bench 2.1 mythos 5: 88.0%; fable 5: 84.3%
gpqa diamond mythos 5: 94.1%
riemannbench mythos 5: 55.0%; mythos preview: 43.0%; opus 4.8: 34.0%
arxivmath mythos 5: 78.5%
critpt mythos 5: 28.6%; gpt-5.5: 27.1%; opus 4.8: 20.9%
graphwalks bfs 1m mythos 5: 79.4%; mythos preview: 74.3%; opus 4.8: 68.1%
humanity’s last exam mythos 5: 59.0% without tools; 64.5% with tools
browsecomp mythos 5: 88.0% single-agent; 93.3% multi-agent
osworld-verified mythos/fable: 85.0%
gdp.pdf fable 5: 29.8% strict pass; mythos 5: 87.6% with tools on mean criteria pass
officeqa pro fable 5: 57.9% on databricks’ eval
legal agent benchmark mythos 5: 16.91% all-pass; 92.0% mean criterion-pass
healthbench mythos 5: 62.7%
healthbench professional mythos 5: 66.0%
multilingual gmmlu / milu / include 93.2%; 92.9%; 90.5%
biomysterybench 83.9% human-solvable; 46.1% human-difficult
organic chemistry mythos 5: 90.1%
labbench2 patent questions mythos 5: 79.8%
raphaelrk
today at 5:42 PM
There's a hacker news link at the end of the document, under "Blocklist used for Humanity’s Last Exam". It links to https://news.ycombinator.com/item?id=44694191
mithun
today at 5:07 PM
Announcement: https://www.anthropic.com/news/claude-fable-5-mythos-5
sebmellen
today at 5:03 PM
Just commenting for posterity… if this is what it claims to be, I am not looking forward to how it will empower the people who submit bug bounties to us.
Historically they’ve been people from certain identifiable countries (usually developing/poorer countries) using fuzzers with low-quality results.
Now, those same people use the current-day models to good effect, but they still don’t have a true security edge and oftentimes the reports are minor or duplicative.
I wonder if that’s about to deeply change.
JohnMakin
today at 5:52 PM
> There were some regressions in the model’s responses to user discussions about suicide and self-harm, and room for improvement in some areas of child safety.
Someone had to make a decision somewhere this is an acceptable regression - wild. And then decide to write it down.
today at 5:10 PM
brianmcnulty
today at 5:31 PM
This is almost as long as an Oracle PeopleSoft update guide. What model do you think they used to generate it?
asdK120
today at 5:36 PM
Is this "system card" equivalent to the stone tablets handed down to Moses? Why don't you call it "user manual"?
Do people chant the "system manual" at Anthropic Tupperware parties? Do they intone a mantra invoking Amodei's name?
Sathwickp
today at 5:34 PM
input price $10 per mil token and output price 50$ per mil token btw
217
today at 5:01 PM
Oh my god it's actually here
LoganDark
today at 5:33 PM
I actually rather like the way they have approached these safeguards. Rather than only teaching the model to refuse a request, or completely rejecting the request, the system gracefully degrades to slightly less powerful or slightly less precise operation. So you still roughly have Opus 4.8 even when safeguards trigger, but with an upgrade when they don't. As much as I hate the way they hype Mythos 5, I think the release of Fable 5 is rather nice. What's not nice though is that they plan to remove it from subscriptions soon, but getting to try it is cool, I suppose.
today at 5:08 PM
dominotw
today at 5:20 PM
system card = marketing material with heavily gamed benchmarks.
briandoll
today at 5:04 PM
New chapter
today at 5:03 PM
today at 5:06 PM
acentaur
today at 5:04 PM
[dead]
robertacion
today at 5:06 PM
[dead]

System Card: Claude Fable 5 and Claude Mythos 5 [pdf]

bkjlblh

mips_avatar

cedws

382hi

2001zhaozhao

Jabrov

matheusmoreira

axus

rfgplk

mips_avatar

rspeele

theLiminator

BoppreH

foobar_______

aspenmartin

shimman

yifanl

logicchains

ben_w

Analemma_

BoppreH

_dwt

BoppreH

jackie293746

BoppreH

Analemma_

BoppreH

vitalyan1234

BoppreH

vitalyan1234

Rekindle8090

bkjlblh

GodelNumbering

217

philipkglass

Aperocky

alephnerd

raphaelrk

mithun

sebmellen

rs_rs_rs_rs_rs

hootz

rs_rs_rs_rs_rs

hootz

JohnMakin

brianmcnulty

asdK120

aesthesia

redox99

apsurd

Sathwickp

217

LoganDark

dominotw

bitwize

dominotw

briandoll

acentaur

robertacion

wslh

ebiester