Prodcast is Google's podcast about Site Reliability Engineering and production software.
Courtney Nash of The VOID discusses the role of human expertise in managing complex systems, and how SREs continue to bring critical value even as technology and AI evolve.
John Allspaw discusses reliability with Prodcast host Steve McGhee at SREcon Americas 2026
John Allspaw joins Prodcast hosts Matt Siegler and Florian Rathgeber for a candid discussion of reliability topics at SREcon Americas 2026.
We speak with Ricard Bejarano about being an SRE at home, discussing Home Lab systems.
We sit down with Matt Zelesko, VP of SRE at Google, for a candid talk about how AI is changing SRE — and how it's not.
Sam Anderson shares his experiences with burnout, and how to support yourself as a reliable system. Sam provides guidance on how to deal with burnout, and some suggestions on how to avoid burnout through understanding yourself and finding the help and support you need.
Crisis Engineer Mikey Dickerson joins us to talk about what constitutes a crisis. Mikey draws on his broad experience across industry and the public sector, as well as on work with his team of systems fixers.
What's happening in the world of SRE and resilience engineering? Join us as we catch up with fellow podcast hosts Colette Alexander and Clint Byrum of the This Is Fine! podcast at SREcon in Seattle.
How do you introduce Site Reliability Engineering to an AI research lab, bringing concepts of scale to engineers who are at the leading edge of AI systems?
In the latest episode of The Prodcast, hosts Steve McGhee and Florian Rathgeber chat with Damion Yates, who helped establish the reliability engineering culture at Google DeepMind. Damion shares his journey of bringing scalable infrastructure to DeepMind, supporting massive mach...
Join us for a discussion with Carla Geisser of Layer Aleph, a company focused on "crisis engineering". Carla distinguishes a crisis from a standard incident by noting that a crisis is novel and lacks a playbook. She outlines five criteria for a true crisis: fundamental surprise, broken critical functions, high visibility, a rigid deadline (unlike internal tech deadlines), and perception breakdown. Crises often arise in organization...
This episode of the Prodcast tackles the challenges of maintaining AI safety and alignment in production. Guests Felipe Tiengo Ferreira and Parker Barnes join hosts Matt Siegler and Steve McGhee to discuss AI model safety, from examining content to emerging security risks. The discussion emphasizes the vital role of SREs in managing safety at scale, detailing multi-layered defenses, including system instructions, LLM classifiers, a...
In this episode of the Prodcast, guest Shannon Brady speaks with hosts Jordan Greenberg and Florian Rathgeber about managing Google's vast fleet of internal devices. Shannon explains how Google's Linux platform uses core SRE principles—specifically testing, canarying, and monitoring—for weekly stage rollouts of its Debian-based distribution. Configuration is efficiently managed using Puppet to ensure the right setup for...
Curious about the real impact of AI on Site Reliability Engineering? In this episode of The Prodcast, Google SRE Denia del Cid breaks down how her team is leveraging AI to transform production workflows.
Denia details practical applications like early outage detection, incident similarity analysis, and toil reduction. She explains the critical importance of validating against "golden data sets" and keeping humans in the loop to bui...
Join us on The Prodcast as we host Heather Adkins, leader of Google's Office of Cybersecurity Resilience, for a critical look at the future of digital defenses. We explore the intersection of SRE and security , unpacking the "Secure by Design" philosophy and the shared DNA of incident management.
Heather candidly discusses the rise of "Agentic AI hackers" and polymorphic malware , revealing how defenders c...
In this episode, we welcome Alex Hidalgo and Brian Singer of nobl9 to discuss Service Level Objectives (SLOs). Alex and Brian talk about how SLOs can establish a vernacular across industry verticals, leading to constructive conversations and a shared understanding of how to implement SRE practices. Join us for a lively discussion that ranges across SLO topics!
In this episode, Steph Hippo, Platform Engineering Director at Honeycomb, joins The Prodcast to discuss AI and SRE.
Steph explains how observability helps us understand complex systems from their outputs, and provides a foundation for SRE to respond to system problems. This episode explains how AI and observability build a self-reinforcing loop.
We also discuss how AI can detect and respond to certain classes of inc...
In this special episode hosts Steve McGhee from the Google SRE Prodcast and Kaslin Fields from the Google Kubernetes Podcast, welcome Google Cloud Solutions Architect Ben Good to discuss platform engineering. Listeners can look forward to hearing about the role of Kubernetes as a tool for building platforms, how to create "golden paths" for developers, and the importance of observability and self-service in platform design. The con...
Google Staff SRE Ramón Llamas and Google Software Engineer Swapnil Hariajoin our hosts to explore how AI agents are revolutionizing production management, from summarizing alerts and finding hidden errors to proactively preventing outages. Learn about the challenges of evaluating non-deterministic systems and the fascinating interplay between human expertise and emerging AI capabilities in ensuring robust and reliable infrastructur...
This episode features Google Technical Program Manager (TPM) Karanveer Anand, who joins our hosts to discuss the unique role of TPMs in Site Reliability Engineering (SRE). The conversation highlights how SRE TPMs bridge the gap between technical details and business impact, managing complex projects with inter-team dependencies and ensuring system reliability, particularly in the rapidly evolving AI landscape.
This episode discusses Systems Theoretic Process Analysis (STPA), a method for analyzing complex systems. Theo Klein, a Google SRE, and Jeffrey Snover, a Distinguished Engineer at Google, explain that STPA focuses on identifying how system accidents and losses occur due to a loss of control, rather than component failures. STPA helps identify design flaws early, even before code is written! The discussion highlights that STPA is a ...
If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.
Hey Jonas! The official Jonas Brothers podcast. Hosted by Kevin, Joe, and Nick Jonas. It’s the Jonas Brothers you know... musicians, actors, and well, yes, brothers. Now, they’re sharing another side of themselves in the playful, intimate, and irreverent way only they can. Spend time with the Jonas Brothers here and stay a little bit longer for deep conversations like never before.
Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by Audiochuck Media Company.
Building on the belief that a deeper understanding of the natural world enriches all of our lives, host Steven Rinella brings an in-depth and relevant look at all outdoor topics including hunting, fishing, nature, conservation, and wild foods. Filled with humor, irreverence, and things that will surprise the hell out of you, each episode welcomes a diverse group of guests who add their own expertise to the vast world of the outdoors. Part of The MeatEater Podcast Network.
Where the world and America meet, with episodes each weekday. The world is changing. Decisions made in the US and by the second Trump administration are accelerating that change. But they are also a symptom of it. With Asma Khalid in DC, Tristan Redman in London, and the backing of the BBC’s international newsroom, The Global Story brings clarity to politics, business and foreign policy in a time of connection and disruption. Come and join us our live event. You can register for Castfest tickets here: https://www.bbc.co.uk/showsandtours/shows/castfest-2026