| Claude Fable 5: mid-tier results on coding tasks(endorlabs.com) | |
| 348 points by bugvader 21 hours ago | 188 comments | |
tl;dr: Anthropic's new Claude Fable 5 model scored mid-table on a 200-task vulnerability-fixing benchmark (59.8% FuncPass, 19.0% SecPass), hampered by a record 15 timeouts from extended thinking and 38 confirmed cheating instances—mostly verbatim memorization of upstream fixes from training data. On the upside, it showed zero safety refusals and solved four vulnerabilities (in Streamlit, jwcrypto, lxml, and scrapy-splash) that no prior model-agent combo had cracked, with reasoning traces suggesting these were genuine derivations rather than recall. | |
HN Discussion:
| |