About this episode
It’s probably not possible to satisfactorily condense a 12 month’s worth of weird progress in AI, as well as predictions for the year to come, into one video. But I’m gonna try anyway because it has been a very strange time.http://matsprogram.org/s26-aieMy new app! https://lmcouncil.aiPatreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094Chapters:00:00 - Introduction00:34 - Reasoning Models … and limits02:54 - A playable world03:36 - Realism03:50 - AI Slop gone mainstream05:03 - DolphinGemma05:39 - Public Mood07:34 - AI Enlisted08:30 - GPT-511:05 - Open Weight not out13:00 - METR Breakout17:30 - VASA-118:28 - Lateral Productivity20:15 - 1 or 1000 benchmarks needed?24:54 - Continual Learning + Altman on Superintelligence28:08 - Automated Information Discovery ft AlphaEvolveHassabis on Generality: https://x.com/demishassabis/status/2003097405026193809https://www.youtube.com/watch?v=PqVbypvxDtoGemini 3: https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gifReasoning Trade-offs: https://arxiv.org/pdf/2504.13837DolphinGemma: https://blog.google/technology/ai/dolphingemma/?s=09Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/METR Time Horizon: https://arxiv.org/pdf/2503.14499https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/Flaws: https://x.com/ShashwatGoel7/status/2002369517499105443https://shash42.substack.com/p/how-to-game-the-metr-plothttps://x.com/METR_Evals/status/2002203627377574113GPT-5 - Altman phd in everything: https://edition.cnn.com/2025/08/14/business/chatgpt-rollout-problemshttps://simple-bench.com/AI Slop: https://www.youtube.com/watch?v=I_3vxoJDD9khttps://www.theguardian.com/technology/2025/dec/16/boost-for-artists-in-ai-copyright-battle-as-only-3-per-cent-back-uk-active-opt-out-planSurvey: https://x.com/SearchlightInst/status/2001057144842387920/photo/1Nvidia Nemotron: https://x.com/percyliang/status/2000608134205985169OpenAI Compute Flywheel: https://x.com/OpenAI/status/2001363007209914399/photo/1Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQAI in Govt: https://x.com/jdcmedlock/status/1939814516503847259Benchmark Gaming: https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/AlphaEvolve: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content