Splunk Inc.

07/02/2024 | News release | Distributed by Public on 07/02/2024 09:22

Uncomplicate SLOs to Deliver Digitally Resilient Systems and Better Customer Experiences

If your organization has an observability practice, it's likely that the end goal was to increase system reliability and customer satisfaction. But balancing reliability needs with the need to innovate to meet ever-increasing customer expectations remains a challenge for most. Many businesses have turned to Service Level Objectives (SLOs), which have been shown to help align the entire organization on business KPIs for reliability and customer experience and drive better data-driven decision making while also delivering cost savings.

One 2023 study found that 96% of organizations are already utilizing SLOs to meet their goals for resilience and customer satisfaction. According to the Nobl9 2023 State Of SLOs Report, 90% of companies indicated that SLOs helped them make better business decisions, 76% indicated that SLOs helped them maintain resilience, while 27% reported savings of more than $500K due to SLO implementation.

Clearly, SLOs can make a big impact, but they can be complicated. We've heard from customers that getting to alignment and leveraging SLOs effectively organization-wide still remains a challenge. That's why Splunk has simplified SLOs for Splunk Observability Cloud users so they can quickly adopt a functioning SLO framework and reap the benefits of an SLO practice. With the launch of a built-in SLO management experience in Splunk Observability Cloud, users get an intuitive experience for SLO creation with insight into the service's current performance to help select realistic thresholds, simplify SLO creation and management, and standardize on best practices as they leverage SLOs across their organization.

Unraveling the SLO Framework

Business leaders want to know that their teams are prioritizing the right things. Are they focusing on reliability and resilience vs. innovation at the right times? But, it can be difficult to understand if each team is making choices that align with the needs of the business. While organizations have a lot of data and teams can create dashboards and alerts to try to keep track of their own priorities, they are often not aligned about the way they make these trade offs. This makes it difficult to have conversations about what they're prioritizing and why.

SLOs give organizations a framework to align the way they talk about service reliability and performance. By following this framework, teams in an organization can speak a common language when they review their reliability and performance. When everyone is speaking the same language, SLOs make it easier for leaders to understand service performance and reliability and therefore understand the decisions their teams are making.

What's in an SLO?

An SLO defines a target for an SLI (Service Level Indicator) and a compliance period over which that target should be met. Generally, an SLO contains:

  • an SLI - a quantitative measurement of the health of a service. This is best understood as a metric or a combination of metrics. SLIs can be:
    • Request based; counting individual events, such as successful requests.
    • Time window based; counting time windows and classifying them as good or bad based on some criteria defined by the user.
  • a target and
  • a complianceperiod - compliance periods can be calendar windows (monthly, quarterly) or rolling windows (past 30 days).

SLOs Simplified in Splunk Observability Cloud

SLOs in Observability Cloud are based on an indicator metric (SLI), which can be a standard service or custom metric, a compliance period, and a target. Creating, managing and standardizing SLOs in Splunk Observability Cloud is simplified with the new SLO page and SLO Creation Wizard.

Starting from the new SLOs tab on the Detectors & SLOs page, users can quickly see a list of all existing SLOs with the details and status at a glance. From this page, users can check the status of each SLO or create a new one.

Creating an SLO in Observability Cloud

By selecting Create SLO, users can step-by-step guidance to create an SLO.

  1. First, you select the indicator metric and type. Currently, users can create request-based SLOs for success or latency.
  2. Next, you'll define their target and compliance window. The system will calculate the total and remaining error budget based on the defined SLO, and provide a quick visualization of failed vs successful requests over the selected compliance window to help you find the right SLO target and view the time windows where the SLO status was impacted.
  3. Once the SLO is defined, you can select when and how to be notified. You can set up simple alerts on error budget consumption or SLO breach and predictive alerts based on burn rate.
  4. The final step is to name and save the SLO.

Adding SLI Charts to Dashboards

You can also add your SLIs to custom dashboards in Observability Cloud to easily keep track of the status of your SLOs, share them across teams, and streamline troubleshooting when an incident occurs. From the SLO tab, you'd simply click the three dots menu on an SLO and choose Add to Dashboard in the pop up to add the selected SLI as a chart to a new or existing dashboard.

Get Alignment on the Things That Matter

SLOs can serve many purposes across the organization to help you deliver digitally resilient systems and flawless customer experiences. Whether you need to provide visibility across the organization on user experience and service-level agreements with customers, monitor burn rate and error rates so you can meet team goals, or gain insight on performance issues to make better development decisions, establishing an SLO practice is the starting point. SLO management is available today for all Splunk Observability Cloud customers at no additional cost.

We're committed to helping you reap these benefits, and we're continuing to refine and improve the SLO management experience for our users. Visit our product documentation to learn more or get started with a Splunk Observability trial today to test out the experience.