Reliability Enablers

Reliability Enablers

Software reliability is a tough topic for engineers in many organizations. The Reliability Enablers (Ash Patel and Sebastian Vietz) know this from experience. Join us as we demystify reliability jargon like SRE, DevOps, and more. We interview experts and share practical insights. Our mission is to help you boost your success in reliability-enabling areas like observability, incident response, release engineering, and more. read.srepath.com

Episodes

July 15, 2025 30 mins

A new or growing SRE team. A copy of the book. A company that says it cares about reliability. What happens next? Usually… not much.

In this episode, I sit down with Dave O’Connor, a 16-year Google SRE veteran, to talk about what happens when organizations cargo-cult reliability practices without understanding the context they were born in.

You might know him for his self-deprecating wit and legendary USENIX blurb about being “compli...

Mark as Played

I know it’s already six months into 2025, but we recorded this almost three months ago. I’ve been busy with my foray into the world of tech consulting and training —and, well, editing these podcast episodes takes time and care.

This episode was prompted by the 2025 Catchpoint SRE Report, which dropped some damning but all-too-familiar findings:

* 53% of orgs still define reliability as uptime only, ignoring degraded experience and hi...

Mark as Played

Most teams talk about reliability with a margin for error. “What’s our SLO? What’s our budget for failure?”

But in the energy sector? There is no acceptable downtime. Not even a little.

In this episode, I talk with Wade Harris, Director of FAST Engineering in Australia, who’s spent 15+ years designing and rolling out monitoring and control systems for critical energy infrastructure like power stations, solar farms, SCADA networks, y...

Mark as Played

Exploring how to manage observability tool sprawl, reduce costs, and leverage AI to make smarter, data-driven decisions.

It's been a hot minute since the last episode of the Reliability Enablers podcast.

Sebastian and I have been working on a few things in our realms. On a personal and work front, I’ve been to over 25 cities in the last 3 months and need a breather.

Meanwhile, listen to this interesting vendor, Ruchir Jha from Cardina...

Mark as Played

Andrew Tunall is a product engineering leader focused on pushing the boundaries of reliability with a current focus on mobile observability. Using his experience from AWS and New Relic, he’s vocal about the need for a more user-focused observability, especially in mobile, where traditional practices fall short.

* Career Journey and Current Role: Andrew Tunall, now at Embrace, a mobile observability startup in Portland, Oregon, star...

Mark as Played

Andrew Fong’s take on engineering cuts through the usual role labels, urging teams to start with the problem they’re solving instead of locking into rigid job titles. He sees reliability, inclusivity, and efficiency as the real drivers of good engineering.

In his view, SRE is all about keeping systems reliable and healthy, while platform engineering is geared toward speed, developer enablement, and keeping costs in check. It’s a va...

Mark as Played



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com
Mark as Played

Here’s what we covered:

Defining Platform Engineering

* Platform engineering: Building compelling internal products to help teams reuse capabilities with less coordination.

* Cloud computing connection: Enterprises can now compose platforms from cloud services, creating mature, internal products for all engineering personas.

Ankit’s career journey

* Didn't choose platform engineering; it found him.

* Early start in programming (since age...

Mark as Played

Why many copy Google’s monitoring team setup

* Google’s Influence. Google played a key role in defining the concept of software reliability.

* Success in Reliability. Few can dispute Google’s ability to ensure high levels of reliability and its ability to share useful ways to improve it in other settings

BUT there’s a problem:

* It’s not always replicable. While Google's practices are admired, they may not be a perfect fit for every te...

Mark as Played

Monitoring in the software engineering world continues to grapple with poor signal-to-noise ratios. It’s a challenge that’s been around since the beginning of software development and will persist for years to come.

The core issue is the overwhelming noise from non-essential data, which floods systems with useless alerts.

This interrupts workflows, affects personal time, and even disrupts sleep.

Sebastian dove into this problem, hig...

Mark as Played

The question then condenses down to: Can technical leads support reliability work? Yes, they can! Anemari has been a technical lead for years — even spending a few years doing that at the coveted consultancy, Thoughtworks — and now coaches others.

She and I discussed the link between this role and software reliability.



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus...
Mark as Played
September 4, 2024 26 mins

We're already well into 2024 and it’s sad that people still have enough fuel to complain about various aspects of their engineering life.

DORA seems to be turning into one of those problem areas.

Not at every organization, but some places are turning it into a case of “hitting metrics” without caring for the underlying capabilities and conversations.

Nathen Harvey is no stranger to this problem.

He used to talk a lot about SRE at Goo...

Mark as Played

We’ll explore 3 use cases for monitoring data. They are:

* Analyzing long-term trends

* Comparing over time or experiment groups

* Conducting ad hoc retrospective analysis

Analyzing long-term trends

You can ask yourself a couple of simple questions as a starting point:

* How big is my database?

* How fast is the database growing?

* How quickly is my user count growing?

As you get comfortable with analyzing data for the simpler questions...

Mark as Played

Shlomo Bielak is the Head of Engineering (Operational Excellence and Cloud) at Penn Interactive, an interactive gaming company.

He’s dedicated much of his talk time at DevOps events to talk about a topic less covered at such technical events. A lot of what he said alluded to ways to become a more valuable engineer.

I’ve broken them down into the following areas:

* Avoid the heroic efforts

* Mind + heart > Mind alone

* Curiosity > Cred...

Mark as Played

Incident response is an increasingly difficult area for organizations. Many teams end up paying a lot of money for incident management solutions. However, issues remain because processes supporting the incident response are not robust.

Incident response software alone isn't going to fix bad incident processes.

It's gonna help for sure. You need these incident management tools to manage the data and communications within the incident...

Mark as Played

According to Vlad Ukis, there are a lot of enterprises around whose IT functions are organized around ITIL. What you use SRE for is something completely different.

SRE is not for setting up the IT function. It is for enabling the product organization to operate online services reliably at scale.

However, the problem is that many in the industry are NOT using SRE principles but instead handing over complex services to a more traditio...

Mark as Played

Sonja Blignaut is a complexity expert. That might not sound relevant to incident response in reliability engineering. But it is!

Our systems are becoming more complex and so are the resulting incidents.

Learning about complexity can help reliability folk go into an incident with less anxiety, which we’ll explore in this episode.

We'll explore the causes of complexity in incidents and how the Cynefin framework classifies incidents.

We'l...

Mark as Played
July 30, 2024 9 mins

Have you got complete monitoring of your software in effect? Are you sure? Google's SREs break monitoring down to white box versus black box monitoring.

It's not the same as internal versus external monitoring, which we'll explore further.

We'll cover topics like:

- (quickly) What is monitoring?

- What is whitebox monitoring?

- What is black box monitoring?

- The rising importance of blackbox monitoring

This is a concept from Chapter 6 (M...

Mark as Played

Jack Neely is a DevOps observability architect at Palo Alto Networks and has a few interesting ways of extracting value from o11y data.

We crammed into just under 25 minutes ideas like these 7 takeaways:

* Reasserting the Need to Monitor Four Golden Signals: Focus on latency, traffic, errors, and saturation for effective system monitoring and management.

* Prioritize Customer Health: in Jack’s words, the 5th golden signal. Go beyond t...

Mark as Played

Alert noise is no joke and neither is the fatigue that results from it. I spoke with Dan Ravenstone who gave a talk at Monitorama about this very topic.

He also happens to be an avid skateboarder!

Here are 9 takeaways from our conversation:

* Regularly Review and Update Monitoring Systems: Don’t set up monitoring once and forget about it. Continuously assess and update your monitoring systems to ensure they remain relevant and effect...

Mark as Played

Popular Podcasts

    If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

    The Breakfast Club

    The World's Most Dangerous Morning Show, The Breakfast Club, With DJ Envy, Jess Hilarious, And Charlamagne Tha God!

    Crime Junkie

    Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

    Betrayal: Season 4

    Please join our Substack for additional exclusive content, curated book recommendations and community discussions. Sign up FREE by clicking this link Beyond Betrayal Substack. Join our community dedicated to truth, resilience and healing. Your voice matters! Be a part of our Betrayal journey on Substack. Karoline Borega married a man of honor – a respected Colorado Springs Police officer. She knew there would be sacrifices to accommodate her husband’s career. But she had no idea that he was using his badge to fool everyone. This season, we expose a man who swore two sacred oaths—one to his badge, one to his bride—and broke them both. We follow Karoline as she questions everything she thought she knew about her partner of over 20 years. And make sure to check out Seasons 1-3 of Betrayal, along with Betrayal Weekly Season 1.

    On Purpose with Jay Shetty

    I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Advertise With Us
Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.