About this episode
Send us Fan MailToday’s episode is a deep dive into NVIDIA’s GTC 2026 message that AI is entering an “inference inflection point” — where running models at scale (not just training them) becomes the main economic and operational battleground.We break down what inference means in 2026, why agentic AI can dramatically increase inference demand, and how NVIDIA is positioning a full-stack “AI factory” approach across hardware, software, and security. We cover new platform roadmaps discussed at GTC, real-world implications for cloud providers and enterprises, and why production AI shifts priorities toward cost-per-task, latency, reliability, and capacity planning.We also dig into the biggest risks: runaway spend from agent loops, reliability challenges in real products and physical AI, and the security shift from prompt-based guardrails to enforceable runtime policy for tools, network access, and data handling. Finally, we close with practical takeaways for teams moving from pilots to production.