About this episode
Claude 3.7 is here, hot on the heels of Grok 3 and a host of other developments, but how good is it really? And what does it say about the next few months in AI? I’ve read the papers, played with the model for hours, and benched it on Simple. Things aren’t slowing down. Plus the latest in humanoid robots, led by Helix and freaked out by Protoclone. And reports of GPT 4.5 and DeepSeek R2.GraySwan Competition! https://app.grayswan.ai/arena/challenge/agent-red-teaminghttps://x.com/GraySwanAI/status/1894084923260043282Chapters:00:00 - Introduction01:25 - Claude 3.7 New Stats/Demos 05:22 - 128k Output06:13 - Pokemon06:58 - Just a tool? 09:54 - DeepSeek R210:20 - Claude 3.7 System Card/Paper Highlights 17:18 - Simple Record Score/Competition20:37 - Grok 3 + Redteaming prizes22:26 - Google Co-scientist24:02 - Humanoid Robot Developments3.7 Release Notes: https://www.anthropic.com/news/claude-3-7-sonnetvs o3 and Grok 3: https://x.com/12exyz/status/1891723056931827959Extended Thinking: https://www.anthropic.com/research/visible-extended-thinking?s=09System Prompt: https://docs.anthropic.com/en/release-notes/system-prompts#feb-24th-2025System Card: https://assets.anthropic.com/m/785e231869ea8b3b/original/claude-3-7-sonnet-system-card.pdfUnfaithful CoT: https://arxiv.org/pdf/2305.04388Original Constitution: https://www.anthropic.com/news/claudes-constitutionResponsible Scaling Policy: https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdfAmodei and Hassabis:https://www.youtube.com/watch?v=4poqjZlM8Lohttps://simple-bench.com/400 Weekly Users: https://x.com/bradlightcap/status/1892579908179882057Grok 3 Jailbroken: https://x.com/LinusEkenstam/status/1893832876581380280Google Co-Scientist: https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/