As AI agents and multimodal models become more prevalent, understanding how to evaluate GenAI is no longer optional – it's essential.
Generative AI introduces new complexities in assessment compared to traditional software, and this week on Chain of Thought we’re joined by Chip Huyen (Storyteller, Tép Studio), Vivienne Zhang (Senior Product Manager, Generative AI Software, Nvidia) for a discussion on AI evaluation best practices.
Before we hear from our guests, Vikram Chatterji (CEO, Galileo) and Conor Bronsdon (Developer Awareness, Galileo) give their takes on the complexities of AI evals and how to overcome them through the use of objective criteria in evaluating open-ended tasks, the role of hallucinations in AI models, and the importance of human-in-the-loop systems.
Afterwards, Chip and Vivienne sit down with Atin Sanyal (Co-Founder & CTO, Galileo) to explore common evaluation approaches, best practices for building frameworks, and implementation lessons. They also discuss the nuances of evaluating AI coding assistants and agentic systems.
Chapters: 00:00 Challenges in Evaluating Generative AI
05:45 Evaluating AI Agents
13:08 Are Hallucinations Bad?
17:12 Human in the Loop Systems
20:49 Panel discussion begins
22:57 Challenges in Evaluating Intelligent Systems
24:37 User Feedback and Iterative Improvement
26:47 Post-Deployment Evaluations and Common Mistakes
28:52 Hallucinations in AI: Definitions and Challenges
34:17 Evaluating AI Coding Assistants
38:15 Agentic Systems: Use Cases and Evaluations
43:00 Trends in AI Models and Hardware
45:42 Future of AI in Enterprises
47:16 Conclusion and Final Thoughts
Follow: Vikram Chatterji: https://www.linkedin.com/in/vikram-chatterji/
Atin Sanyal: https://www.linkedin.com/in/atinsanyal/
Conor Bronsdon: https://www.linkedin.com/in/conorbronsdon/ Chip Huyen: https://www.linkedin.com/in/chiphuyen/ Vivienne Zhang: https://www.linkedin.com/in/viviennejiaozhang/
Show notes: Watch all of Productionize 2.0: https://www.galileo.ai/genai-productionize-2-0
Stuff You Should Know
If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.
Dateline NBC
Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com
On Purpose with Jay Shetty
I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!