The PM’s Role in AI Evals: Step-by-Step
HomeThe Growth Podcast › Episode

The PM’s Role in AI Evals: Step-by-Step

1:34:33 Jul 11, 2025
About this episode
Today, we’ve got some of our most requested guests yet: Hamel Husain and Shreya Shankar, creators of the world’s best AI Evals cohort.You’ll learn:- Why AI evaluations are the most critical skill for building successful AI products- What common mistakes people are making and how to avoid them- How to effectively "hill climb" towards better AI performanceIf you're building AI features, or aiming to master how AI Eval actually works, this episode is your step-by-step blueprint.----Brought to you by:The AI Evals Course for PMs & Engineers: You get $800 with this linkJira Product Discovery: Plan with purpose, ship with confidenceVanta: Automate compliance, security, and trust with AI (Get $1,000 with my link)AI PM Certification: Get $500 with code AAKASH25----Timestamps:00:00:00- Preview00:02:06 - Three reasons PMs NEED evals.00:04:40 - Why PMs shouldn't view evals as monotonous00:06:23 - Are evals the hardest part of AI products solved?00:07:37 - Why can't you just rely on human "vibe checks"?00:12:11 - Ad 1 (AI Evals Course)00:13:10 - Ad 2 (Jira Product Discovery)00:14:06 - Are LLMs good at 1-5ratings?00:15:45 - The "Whack-a-mole" analogy without evals00:16:26 - Hallucination problem in emails (Apollo story)00:21:22 - How Airbnb used machine learning models?00:23:56 - Evaluating RAG Systems.00:29:52 - Ad 3 (Vanta)00:30:56 - Ad 4 (AIPM Certification on Maven)00:31:42 - Hill Climbing00:35:51 - Red flag: Suspiciously high eval metrics00:39:02 - Design principles for effective evals00:42:42 - How OpenAI approaches evals00:44:39 - Foundation models are trained on "average taste"00:49:36 - Cons of fine-tuning00:51:27 - Prompt engineering vs. RAG vs. Fine-tuning00:53:00 - Introduction of "The Three Gulfs" framework00:56:04 - Roadmap for learning AI evals01:01:41 - Why error analysis is critical for LLMs01:08:29 - Using LLM as a judge01:10:15 - Frameworks for systematic problem-solving in labels01:17:42 - Importance of niche and qualifying clients. (Pro tips)01:18:43 - $800K for first course cohort!01:20:15 - Why end a successful cohort?01:25:49 - GOLD advice for creating a successful course01:33:39 - Outro----Key Takeaways:1. Stop
Select an episode
0:00 0:00