This week on Fresh From the Labs, we're looking past the leaderboards and hype to explore the real-world challenges and limitations of today's AI.
Can AI actually run a company? We dive into recent CMU research that put AI agents to the test, revealing significant struggles with common sense tasks and complex automation like using a web browser effectively.
The conversation unpacks the performance of specific models like o3, contrasting benchmark achievements with practical usability and the ever-present issue of AI hallucinations. We discuss the dangers these hallucinations pose, especially in critical applications, how they can subtly mislead users, create more work, and why simply topping a leaderboard (thanks, Goodhart's Law!) doesn't guarantee success for your specific problem.
Join Shilpa, Jared, and Kevin as they discuss the trial-and-error reality of model selection, the importance of truly understanding the problem you're solving, and why promising developments like local models might offer a path forward through some of these current hurdles. It's a candid look at where AI excels and where it still falls short.
Link to Dr. Anthony Diamond's blog post on o1: https://www.psl.com/feed-posts/o1-an-entirely-different-animal---buyer-beware
United States of Kennedy
United States of Kennedy is a podcast about our cultural fascination with the Kennedy dynasty. Every week, hosts Lyra Smith and George Civeris go into one aspect of the Kennedy story.
Stuff You Should Know
If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.
Dateline NBC
Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com