How to Do AI Evals Step-by-Step with Real Production Data | Tutorial by Hamel Husain and Shreya Shankar

About this episode

Today’s EpisodeEveryone’s demoing AI features. Few are shipping them to production reliably.The gap? Evals.Not the theoretical kind. The real-world kind that catches bugs before users do.Hamel Husain and Shreya Shankar train people at OpenAI, Anthropic, Google, and Meta on how to build AI products that actually work. Their Maven course is the top-grossing course on the platform.Today, they’re walking you through their complete eval process.----Brought to you by:* The AI Evals Course for PMs & Engineers: You get $800 with this link* Vanta: Automate compliance, Get $1,000 with my link* Jira Product Discovery: Plan with purpose, ship with confidence* Land PM job: 12-week experience to master getting a PM job* Pendo: the #1 Software Experience Management Platform----If you want access to my AI tool stack - Dovetail, Arize, Linear, Descript, Reforge Build, DeepSky, Relay.app, Magic Patterns, and Mobbin - for free, grab Aakash’s bundle.Are you searching for a PM job? Join me + 29 others for an intensive 12-week experience to master getting a PM job. Only 23 seats left.----Key Takeaways:1. AI evals are the #1 most important new skill for PMs in 2025 - Even Claude Code teams do evals upstream. For custom applications, systematic evaluation is non-negotiable. Dog fooding alone isn't enough at scale.2. Error analysis is the secret weapon most teams skip - Looking at 100 traces teaches you more tha