All Episodes

August 15, 2024 9 mins

Incident response is an increasingly difficult area for organizations. Many teams end up paying a lot of money for incident management solutions. However, issues remain because processes supporting the incident response are not robust.

Incident response software alone isn't going to fix bad incident processes.

It's gonna help for sure. You need these incident management tools to manage the data and communications within the incident.

But you also need to have effective processes and human-technology integration. Dr Ukis wrote in his Establishing SRE Foundations book about complex incident coordination and priority setting.

According to Vladislav, at the beginning of your SRE journey, it’s not going to be focused on incident response in terms of setting up an incident response process, but more on core SRE artifacts like SLIs, availability measurement, SLOs, etc.

And now we are safely investing more into the customer-facing features and things like this. So this is going to be the core SRE concepts. But then at some point, once you've got these things, more or less established in the organization.

Understanding and Leveraging SLOs

Once your Service Level Objectives (SLOs) are well-defined and refined over time, they should accurately reflect user and customer experiences. Your SLOs are no longer just initial metrics; they’ve been validated through production.

Product managers should now be able to use this data to make informed decisions about feature prioritization. This foundational work is crucial because it sets the stage for integrating a formal incident response process effectively.

Implementing a Formal Incident Response

Before you overlay a formal incident response process, ensure that you have the cultural and technical groundwork in place.

Without this, the process might not be as effective. When the foundational SLOs and organizational culture are strong, a well-structured incident response process can significantly enhance its effectiveness.

Coordinating During Major Incidents

When a significant incident occurs, detecting it through SLO breaches is just the beginning. You need a system in place to coordinate responses across multiple teams.

Consider appointing incident commanders and coordinators, as recommended in PagerDuty’s documentation, to manage this coordination. Develop a lightweight process to guide how incidents are handled.

Classifying Incidents

Establish an incident classification scheme to differentiate between types of incidents. This scheme should include priorities such as Priority One, Priority Two, and Priority Three.

Due to the inherently fuzzy nature of incidents, your classification system should also include guidelines for handling ambiguous cases. For instance, if uncertain whether an incident is Priority One or Two, default to Priority One.

Deriving Actions from Incident Classification

Based on the incident classification, outline specific actions. For example, Priority One incidents might require immediate involvement from an incident commander.

They might take the following actions:

* Create a communication channel, assemble relevant teams, and start coordination.

* Simultaneously inform stakeholders according to their priority group.

* Define stakeholder groups and establish protocols for notifying them as the situation evolves.

Keep Incident Response Processes Simple and Accessible

Ensure that your incident response process is concise and easily understandable. Ideally, it should fit on a single sheet of paper. Complexity can lead to confusion and inefficiencies, so aim for simplicity and clarity in your proce

Mark as Played

Advertise With Us

Popular Podcasts

Stuff You Should Know
The Breakfast Club

The Breakfast Club

The World's Most Dangerous Morning Show, The Breakfast Club, With DJ Envy, Jess Hilarious, And Charlamagne Tha God!

Crime Junkie

Crime Junkie

Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.