When Will AI Models Blackmail You, and Why?

When Will AI Models Blackmail You, and Why?

26:19 Jun 24, 2025
About this episode
In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models *want* this?Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: storyblocks.com/AIExplainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction01:20 - What prompts blackmail?02:44 - Blackmail walkthrough 06:04 - ‘American interests’08:00 - Inherent desire?10:45 - Switching Goals11:35 - Murder12:22 - Realizing it’s a scenario? 15:02 - Prompt engineering fix?16:27 - Any fixes?17:45 - Chekov’s Gun19:25 - Job implications21:19 - Bonus DetailsReport: https://www.anthropic.com/research/agentic-misalignment30 Page Appendices: https://assets.anthropic.com/m/6d46dac66e1a132a/original/Agentic_Misalignment_Appendix.pdfAnnouncement: https://x.com/AnthropicAI/status/1936144602446082431?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5EtweetOpenAI Files: https://www.openaifiles.org/Grok 4 News: https://x.com/RonFilipkowski/status/1936372579607912473Claude 4 Report Card: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdfNew Apollo Research: https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-schemingInteresting Reflections: https://nostalgebraist.tumblr.com/post/785766737747574784/the-voidNon-hype Newsletter: https://signaltonoise.beehiiv.com/
Select an episode
0:00 0:00