| What happened after 2k people tried to hack my AI assistant(fernandoi.cl) | |
| 364 points by cuchoi 1 day ago | 160 comments | |
tl;dr: The author ran a public bounty challenge where 2,000+ people sent 6,000+ emails trying to prompt-inject Claude Opus 4.6 into leaking a secrets.env file, and none succeeded despite sophisticated attacks involving authority impersonation, multi-language social engineering, and Anthropic's refusal trigger string. Side effects included Gmail suspending the account, $500+ in API costs, and the agent eventually inferring it was a security exercise from memory context. The author concludes prompt injection is harder than expected with frontier models and simple prompts, but notes weaker models and multi-turn attacks weren't tested. | |
HN Discussion:
| |