In this episode, I sit down with Ankur Goyal, founder and CEO of Braintrust, the AI evals and observability platform used by teams like Notion, Stripe, Vercel, and Zapier. This one is for the senior engineers, staff engineers, VPs of engineering, and CTOs in my audience. We get into how coding agents can take on deeply technical architecture and infrastructure work that no single human engineer could tackle before, and then we demystify evals so you can use them to make your AI products better without touching the implementation.
What you’ll learn:
—
Brought to you by:
Guru—The AI layer of truth
Persona—Trusted identity verification for any use case
—
In this episode, we cover:
(00:00) Introduction to Ankur Goyal
(03:00) Using AI agents for database optimization
(06:10) Running exhaustive benchmarks with coding agents
(09:03) Why staff engineers are wrong about AI limitations
(11:30) The “agent line” framework for delegation
(14:00) Ankur’s workflow: running 4 to 6 concurrent agents
(17:16) Technical setup: foreground agents, background agents, and cloud environments
(20:32) Spending time with AI tools
(23:06) Demystifying evals
(26:02) Live demo: Building an eval for documentation answers
(30:20) The alternative to evals: vibe checks and whack-a-mole
(32:09) Capturing designer taste in scoring functions
(33:13) Quick recap
(33:44) Managing velocity and throughput
(35:40) Why CI/CD investment is critical for AI-accelerated teams
(37:30) Ankur’s prompting strategy when agents fail
(39:10) Closing thoughts and how to connect
—
Tools referenced:
• Braintrust: https://www.braintrust.dev/
• Codex: https://openai.com/codex/
• GPT 5.4: https://developers.openai.com/api/docs/models/gpt-5.4
• Claude: https://claude.ai/
—
Other references:
• GPT 5.5 just did what no other model could: https://www.lennysnewsletter.com/p/gpt-55-just-did-what-no-other-model
• Paul Graham’s Maker vs. Manager Schedule: http://www.paulgraham.com/makersschedule.html
• tmux: https://github.com/tmux/tmux
• Chris Tate at Vercel: https://www.linkedin.com/in/ctatedev/
—
Where to find Ankur Goyal:
LinkedIn: https://www.linkedin.com/in/ankrgyl/
—
Where to find Claire Vo:
ChatPRD: https://www.chatprd.ai/
Website: https://clairevo.com/
LinkedIn: https://www.linkedin.com/in/clairevo/
—
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.
Hey Jonas!
Hey Jonas! The official Jonas Brothers podcast. Hosted by Kevin, Joe, and Nick Jonas. It’s the Jonas Brothers you know... musicians, actors, and well, yes, brothers. Now, they’re sharing another side of themselves in the playful, intimate, and irreverent way only they can. Spend time with the Jonas Brothers here and stay a little bit longer for deep conversations like never before.
Dateline NBC
Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com
Stuff You Should Know
If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.