PagerDuty Inc.

07/22/2024 | News release | Distributed by Public on 07/22/2024 13:30

Learning from Major Incidents: The Opportunities We’re Missing

While they are untimely, stressful and likely to highlight communication breakdowns within an organization; incidents can be a powerful tool for learning and growth in organizations.

When an incident occurs with a large impact, which it feels like we read about this happening in the news on a weekly basis, oftentimes the focus is on two things: stabilizing the situation, and controlling the narrative. Organizations often miss the opportunity incidents present: learning.

While all organizations will say they support learning, many simply haven't realized the expertise it takes to both unearth the necessary data points and to disseminate those insights so that employees (and executives) can use them for growth.

Most organizations rely on a small group of people to jump in and start fixing the situation-they are the experts and can often figure out what needs to be done and who they should call on.

One of the best opportunities you have after an incident? Building more experts.

As an industry, we know that we tend to over-rely on the expertise of a few engineers and individuals to help in a situation. In fact, I bet if I asked you now, you could easily rattle off five names of who you would call on in a major incident. My concern is that the way we're approaching this problem is to replace these humans with GenAI, rather than leveraging GenAI to instead teach more humans in our organizations and grow beyond this group of five humans.

Typically, the expert in resolving a situation doesn't even realize why or how they're doing what they're doing - it's second nature to them. If we can solicit the why and the how - we can use incidents to build a larger group of experts.

After (more expertise, can be leveraged to apply towards moving with more resilience)

"Some developers of expert systems observed that highly skilled experts can carry out tasks without being aware of how or why they do what they do"

- Minding the Weather, How Expert Forecasters Think

Here are some quick tips for organizations after a large incident (I highly recommend reading our HOWIE post-incident guide for thorough recommendations):

  1. Separate the public incident review (with the motive to show customer confidence) from the internal learning review. Yes, it's important to share things right away-a "day 1 flash " of sorts. However, there is also a "day 5 flash", and a "day 30 flash", where you learn more (the 30 day flash can take insights from the internal post-incident review). It's important not to make promises in the day 1 flash-you're still in "learning mode" and if you do, it can distract how your organization improves.
  2. Leverage someone technical that did not participate in the incident to conduct the internal incident review, interviews, and recommendations. This is important. Oftentimes the people that participate in the incident have too much tunnel vision in order to really unearth the full picture. (With PagerDuty, you can work with us on this, no product attached-just incident analysis experts that can help you extract insights that might be hard to see when you're in it.)
  3. Take the time to really understand the divide and perspectives between executives and technical employees on the front lines. These may be exacerbated after a large event and it's imperative that the person from item number 2 collect both perspectives through their cognitive interviews. Closely evaluating this relationship will allow the incident analyst to provide recommendations that shed light on the best ROI.

No organization asks for large-scale, public, costly incidents. As PagerDuty CEO said on CNBC - software is not perfect. But ultimately, incidents and outages happen and GenAI alone won't fix it. These tools and a deep, human-centric incident analysis process can help employees learn, and help them be more interested in evolving. Invest in your employees training and development, and they're not only more likely to stay, they're more likely to have the expertise needed to continue growing your business.¸