All Episodes

December 4, 2024 48 mins

As AI agents and multimodal models become more prevalent, understanding how to evaluate GenAI is no longer optional – it's essential. 

Generative AI introduces new complexities in assessment compared to traditional software, and this week on Chain of Thought we’re joined by Chip Huyen (Storyteller, Tép Studio), Vivienne Zhang (Senior Product Manager, Generative AI Software, Nvidia) for a discussion on AI evaluation best practices. 

Before we hear from our guests, Vikram Chatterji (CEO, Galileo) and Conor Bronsdon (Developer Awareness, Galileo) give their takes on the complexities of AI evals and how to overcome them through the use of objective criteria in evaluating open-ended tasks, the role of hallucinations in AI models, and the importance of human-in-the-loop systems.

Afterwards, Chip and Vivienne sit down with Atin Sanyal (Co-Founder & CTO, Galileo) to explore common evaluation approaches, best practices for building frameworks, and implementation lessons. They also discuss the nuances of evaluating AI coding assistants and agentic systems.

Chapters: 00:00 Challenges in Evaluating Generative AI

05:45 Evaluating AI Agents

13:08 Are Hallucinations Bad?

17:12 Human in the Loop Systems

20:49 Panel discussion begins

22:57 Challenges in Evaluating Intelligent Systems

24:37 User Feedback and Iterative Improvement

26:47 Post-Deployment Evaluations and Common Mistakes

28:52 Hallucinations in AI: Definitions and Challenges

34:17 Evaluating AI Coding Assistants

38:15 Agentic Systems: Use Cases and Evaluations

43:00 Trends in AI Models and Hardware

45:42 Future of AI in Enterprises

47:16 Conclusion and Final Thoughts

Follow: Vikram Chatterji: https://www.linkedin.com/in/vikram-chatterji/

Atin Sanyal: ⁠⁠https://www.linkedin.com/in/atinsanyal/

Conor Bronsdon: https://www.linkedin.com/in/conorbronsdon/ Chip Huyen: ⁠https://www.linkedin.com/in/chiphuyen/⁠ Vivienne Zhang: ⁠⁠https://www.linkedin.com/in/viviennejiaozhang/


Show notes: Watch all of Productionize 2.0: ⁠https://www.galileo.ai/genai-productionize-2-0⁠

Mark as Played

Advertise With Us

Popular Podcasts

Stuff You Should Know
Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.