About this episode
AI benchmarks are breaking—saturated, gamed, and increasingly disconnected from real-world performance. This episode explores why that’s happening and how new tests like ARC AGI 3 aim to measure actual learning and reasoning instead of memorization. In the headlines: Apple’s deeper Gemini plans, a major efficiency breakthrough from Google, and rising political tension around AI infrastructure.Brought to you by:KPMG – Agentic AI is powering a potential $3 trillion productivity shift, and KPMG’s new paper, Agentic AI Untangled, gives leaders a clear framework to decide whether to build, buy, or borrow—download it at www.kpmg.us/NavigateMercury - Modern banking for business and now personal accounts. Learn more at https://mercury.com/personal-bankingRecall - The API for meeting recording. Get Get started today with $100 in free credits at https://www.recall.ai/aidbAIUC-1 - Get your agents certified to communicate trust to enterprise buyers - https://www.aiuc-1.com/Blitzy - Want to accelerate enterprise software development velocity by 5x? https://blitzy.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefRobots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/