Software reliability is a tough topic for engineers in many organizations. The Reliability Enablers (Ash Patel and Sebastian Vietz) know this from experience. Join us as we demystify reliability jargon like SRE, DevOps, and more. We interview experts and share practical insights. Our mission is to help you boost your success in reliability-enabling areas like observability, incident response, release engineering, and more. read.srepath.com
What if agentic AI makes SRE more important, not less? Bennett Gould explains why autonomous AI systems may create more demand for reliability thinking — not less.
Everyone seems to think AI is coming for SRE in a hard way.
You might have heard the same story:
“AI will write the code.”
“Agents will handle incidents.”
“Copilots will generate the runbooks.”
“Automation will reduce operational...
What if the hardest part of reliability has nothing to do with tooling or automation? Jennifer Petoff explains why real reliability comes from the human workflows wrapped around the engineering work.
Everyone seems to think AI will automate reliability away.
I keep hearing the same story:
“Our tooling will catch it.”
“Copilots will reduce operational load.”
“Automation will mitigate incidents before the...
A new or growing SRE team. A copy of the book. A company that says it cares about reliability. What happens next? Usually… not much.
In this episode, I sit down with Dave O’Connor, a 16-year Google SRE veteran, to talk about what happens when organizations cargo-cult reliability practices without understanding the context they were born in.
You might know him for his self-deprecating wit and legendary USENIX blurb about ...
I know it’s already six months into 2025, but we recorded this almost three months ago. I’ve been busy with my foray into the world of tech consulting and training —and, well, editing these podcast episodes takes time and care.
This episode was prompted by the 2025 Catchpoint SRE Report, which dropped some damning but all-too-familiar findings:
* 53% of orgs still define reliability as uptime only, ignoring degraded...
Most teams talk about reliability with a margin for error. “What’s our SLO? What’s our budget for failure?”
But in the energy sector? There is no acceptable downtime. Not even a little.
In this episode, I talk with Wade Harris, Director of FAST Engineering in Australia, who’s spent 15+ years designing and rolling out monitoring and control systems for critical energy infrastructure like power stations, ...
Exploring how to manage observability tool sprawl, reduce costs, and leverage AI to make smarter, data-driven decisions.
It's been a hot minute since the last episode of the Reliability Enablers podcast.
Sebastian and I have been working on a few things in our realms. On a personal and work front, I’ve been to over 25 cities in the last 3 months and need a breather.
Meanwhile, listen to this interesting vendor, Ruchir Jha from C...
Andrew Tunall is a product engineering leader focused on pushing the boundaries of reliability with a current focus on mobile observability. Using his experience from AWS and New Relic, he’s vocal about the need for a more user-focused observability, especially in mobile, where traditional practices fall short.
* Career Journey and Current Role: Andrew Tunall, now at Embrace, a mobile observability startup in Portland, Oregon...
Andrew Fong’s take on engineering cuts through the usual role labels, urging teams to start with the problem they’re solving instead of locking into rigid job titles. He sees reliability, inclusivity, and efficiency as the real drivers of good engineering.
In his view, SRE is all about keeping systems reliable and healthy, while platform engineering is geared toward speed, developer enablement, and keeping costs in chec...
Here’s what we covered:
Defining Platform Engineering
* Platform engineering: Building compelling internal products to help teams reuse capabilities with less coordination.
* Cloud computing connection: Enterprises can now compose platforms from cloud services, creating mature, internal products for all engineering personas.
Ankit’s career journey
* Didn't choose platform engineering; it found him.
* Early start in programmin...
Why many copy Google’s monitoring team setup
* Google’s Influence. Google played a key role in defining the concept of software reliability.
* Success in Reliability. Few can dispute Google’s ability to ensure high levels of reliability and its ability to share useful ways to improve it in other settings
BUT there’s a problem:
* It’s not always replicable. While Google's practices are admired, they may not...
Monitoring in the software engineering world continues to grapple with poor signal-to-noise ratios. It’s a challenge that’s been around since the beginning of software development and will persist for years to come.
The core issue is the overwhelming noise from non-essential data, which floods systems with useless alerts.
This interrupts workflows, affects personal time, and even disrupts sleep.
Sebastian dove into this ...
The question then condenses down to: Can technical leads support reliability work? Yes, they can! Anemari has been a technical lead for years — even spending a few years doing that at the coveted consultancy, Thoughtworks — and now coaches others.
She and I discussed the link between this role and software reliability.
We're already well into 2024 and it’s sad that people still have enough fuel to complain about various aspects of their engineering life.
DORA seems to be turning into one of those problem areas.
Not at every organization, but some places are turning it into a case of “hitting metrics” without caring for the underlying capabilities and conversations.
Nathen Harvey is no stranger to this problem.
He used to talk a lo...
We’ll explore 3 use cases for monitoring data. They are:
* Analyzing long-term trends
* Comparing over time or experiment groups
* Conducting ad hoc retrospective analysis
Analyzing long-term trends
You can ask yourself a couple of simple questions as a starting point:
* How big is my database?
* How fast is the database growing?
* How quickly is my user count growing?
As you get comfortable with analyzing data for the simpler que...
Shlomo Bielak is the Head of Engineering (Operational Excellence and Cloud) at Penn Interactive, an interactive gaming company.
He’s dedicated much of his talk time at DevOps events to talk about a topic less covered at such technical events. A lot of what he said alluded to ways to become a more valuable engineer.
I’ve broken them down into the following areas:
* Avoid the heroic efforts
* Mind + heart > Mind alone
* C...
Incident response is an increasingly difficult area for organizations. Many teams end up paying a lot of money for incident management solutions. However, issues remain because processes supporting the incident response are not robust.
Incident response software alone isn't going to fix bad incident processes.
It's gonna help for sure. You need these incident management tools to manage the data and communications within the incident...
According to Vlad Ukis, there are a lot of enterprises around whose IT functions are organized around ITIL. What you use SRE for is something completely different.
SRE is not for setting up the IT function. It is for enabling the product organization to operate online services reliably at scale.
However, the problem is that many in the industry are NOT using SRE principles but instead handing over complex services to a more traditio...
Sonja Blignaut is a complexity expert. That might not sound relevant to incident response in reliability engineering. But it is!
Our systems are becoming more complex and so are the resulting incidents.
Learning about complexity can help reliability folk go into an incident with less anxiety, which we’ll explore in this episode.
We'll explore the causes of complexity in incidents and how the Cynefin framework classifies incident...
Have you got complete monitoring of your software in effect? Are you sure? Google's SREs break monitoring down to white box versus black box monitoring.
It's not the same as internal versus external monitoring, which we'll explore further.
We'll cover topics like:
- (quickly) What is monitoring?
- What is whitebox monitoring?
- What is black box monitoring?
- The rising importance of blackbox monitoring
This is a concept from Chapter 6 (M...
Betrayal Weekly is back for a new season. Every Thursday, Betrayal Weekly shares first-hand accounts of broken trust, shocking deceptions, and the trail of destruction they leave behind. Hosted by Andrea Gunning, this weekly ongoing series digs into real-life stories of betrayal and the aftermath. From stories of double lives to dark discoveries, these are cautionary tales and accounts of resilience against all odds. From the producers of the critically acclaimed Betrayal series, Betrayal Weekly drops new episodes every Thursday. If you would like to share your story, you can reach out to the Betrayal Team by emailing them at betrayalpod@gmail.com and follow us on Instagram at @betrayalpod and @glasspodcasts. Please join our Substack for additional exclusive content, curated book recommendations, and community discussions. Sign up FREE by clicking this link Beyond Betrayal Substack. Join our community dedicated to truth, resilience, and healing. Your voice matters! Be a part of our Betrayal journey on Substack.
If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.
Listen to 'The Bobby Bones Show' by downloading the daily full replay.
Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by Audiochuck Media Company.
The Clay Travis and Buck Sexton Show. Clay Travis and Buck Sexton tackle the biggest stories in news, politics and current events with intelligence and humor. From the border crisis, to the madness of cancel culture and far-left missteps, Clay and Buck guide listeners through the latest headlines and hot topics with fun and entertaining conversations and opinions.