All Episodes

September 30, 2025 7 mins

This episode primarily discusses the evaluation and performance of large language models (LLMs) in complex software engineering tasks, specifically focusing on long-context capabilities. One source, an excerpt from Simon Willison’s Weblog, praises the new Claude Sonnet 4.5 model for its superior performance in code generation, detailing an impressive complex SQLite database refactoring task it successfully completed using its Code Interpreter feature. The second source, an abstract and excerpts from the LoCoBench academic paper, introduces a new, comprehensive benchmark designed to test long-context LLMs up to 1 million tokens across eight specialized software development task categories and 10 programming languages, arguing that existing benchmarks are inadequate for realistic, large-scale code systems. This paper reveals that while models like Gemini-2.5-Pro may lead overall, different models, such as GPT-5, show specialized strengths in areas like Architectural Understanding. Finally, a Reddit post further contributes to the practical discussion by sharing real-world testing results comparing Claude Sonnet 4 and Gemini 2.5 Pro on a large Rust codebase.

Send us a text

Support the show


Podcast:
https://kabir.buzzsprout.com


YouTube:
https://www.youtube.com/@kabirtechdives

Please subscribe and share.

Mark as Played

Advertise With Us

Popular Podcasts

Stuff You Should Know
Las Culturistas with Matt Rogers and Bowen Yang

Las Culturistas with Matt Rogers and Bowen Yang

Ding dong! Join your culture consultants, Matt Rogers and Bowen Yang, on an unforgettable journey into the beating heart of CULTURE. Alongside sizzling special guests, they GET INTO the hottest pop-culture moments of the day and the formative cultural experiences that turned them into Culturistas. Produced by the Big Money Players Network and iHeartRadio.

Crime Junkie

Crime Junkie

Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.