Datadog Inc.

11/05/2024 | News release | Distributed by Public on 11/05/2024 14:55

How we use Scorecards to define and communicate best practices at scale

In modern, distributed applications, shared standards for performance and reliability are key to maintaining a healthy production environment and providing a dependable user experience. But establishing and maintaining these standards at scale can be a challenge: when you have hundreds or thousands of services overseen by a wide range of teams, there are no one-size-fits-all solutions. How do you determine effective best practices in such a complex environment? And how do you track whether or not services are consistently meeting your benchmarks throughout the ongoing development of your application? Monitoring is key, of course. But when you have hundreds or thousands of services, how do you ensure that each of them is effectively monitored in the first place? And what about before a service has been built? How do you enforce best practices for development, observability, and so on from the beginning of the software development life cycle?

With Scorecards, a feature of the Datadog Service Catalog, organizations can gauge everything from the performance and observability to the documentation and security of their services, guided by industry standards as well as custom rules, and provide actionable feedback to service owners on an ongoing basis.

In this post, we'll explore how Scorecards have helped SREs at Datadog provide robust guidelines to our service owners at scale, throughout the software development life cycle, by defining best practices in collaboration with a wide range of teams.

Before Scorecards: Tracking production-readiness and adherence to best practices

Before implementing Scorecards, Datadog relied on an entirely manual production-readiness review (PRR) process. This involved a member of our SRE team sitting down with each of our service owners to walk through a lengthy list of checks (related to everything from instrumentation and data security to standards for API and database usage) as they prepared for launch. Service owners had to make adjustments based on any checks that were not met, and SREs had to keep track of their work and comprehensively review the service again before it could be deployed.

This wasn't a scalable process. As Datadog grew and our services multiplied, so did our list of checks for production-readiness. Meanwhile, the SREs in charge of this process needed increasingly broad knowledge of many different specialized aspects of our platform. What's more, each of our services had to pass our PRR checks just once. But once they had, there was significant potential for these services to stray from the standards enforced by PRR in the course of ongoing development. Another complicating factor was the evolution of these standards themselves: while many of our PRR checks covered perennial best practices for instrumentation, security, and documentation, for example, others-such as those tied to the implementation of specific frameworks-could change.

With that in mind, we wanted to go beyond one-off reviews and augment the monitoring already implemented by service owners by continuously evaluating our services' adherence to best practices throughout the software development life cycle. That's where Scorecards came in.

A distributed approach to continuously evaluating our services

Scorecards stemmed from the Datadog Service Catalog, which consolidates knowledge of an organization's services by providing information on their performance, reliability, and ownership in a central location. Scorecards were designed to enable any and all qualified stakeholders to provide guidelines for services and give actionable feedback to service owners.

The ability to define best practices in a collaborative and distributed way is key to scalability. At Datadog, we manage more than 8,000 discrete internal services. Given the scale and complexity of our systems, we inevitably have a lot of internal specialization. As a result, while our internal implementation of Scorecards is overseen by our SRE team, it has been a highly distributed effort: rules for different aspects of our services have been set in consultation with a wide range of stakeholders.

To roll out Scorecards internally, we first set out to identify which teams they could benefit most, and how. We began by identifying the types of personas that would benefit most from the ability to provide guidelines for our services:

  • Security and reliability teams, who seek to ensure compliance and minimize the risks posed by potential data breaches throughout our applications.
  • API and database platform teams, who seek to standardize usage in order to prevent technical debt.
  • Really, any team providing an internal service or library. Developers today use a greater diversity of libraries than ever before, and lack of guidance on best practices for library usage often leads to technical debt.

We also considered service owners' priorities at a high level:

  • Avoiding surprises related to security, reliability, or performance
  • Concentrating on their business domain in order to focus on optimization
  • Knowing and using the best tools for their jobs
  • Designing and launching with predictable timelines

And management's priorities for our services:

  • Avoiding surprises related to security, reliability, or performance
  • Tracking adherence to best practices in order to understand how well-positioned we are to avoid unwelcome surprises

Developing our Scorecard rules

After identifying the types of stakeholders that would be involved and considering their priorities at a high level, we set to work identifying which specific teams at Datadog were best positioned to set policy around each facet of our PRR process. These teams become our initial rule providers, defining standards for our services within their respective areas of expertise. SREs spent time speaking with these rule providers-teams like Security and Infrastructure-to understand the standards important to them and establish the benchmarks to be met in order for services to be considered production-ready.

Broadly speaking, our rules cover topics such as security, deployment practices, observability, chaos engineering, and documentation:

  • Our security rules cover things like the handling of sensitive data and the definition of network policies for Kubernetes.
  • Our deployment practice rules cover things like the usage of staged rollouts and feature flags.
  • Our observability rules cover things like the implementation of deployment tracking and the correlation of APM and logs.
  • Our chaos engineering rules check for the existence of fault-injection testing.
  • Our documentation rules cover things like the existence of simple service overviews and in-depth technical documentation, as well as the definition of code repositories, on-call team members, and runbooks.

Outsourcing the basic definition of these rules to the teams specializing in these areas has helped us ensure that the standards we use to evaluate our services are applicable, relevant, and aligned with industry standards. Enabling these teams to define custom rules via the Scorecards API allows them to directly disseminate clear and actionable guidance to service owners on an ongoing basis.

Giving actionable feedback to service owners

When it comes to results, we use a central evaluation engine owned by our SRE team to continuously evaluate our rules. This helps us ensure consistency and prevent redundancy at scale. Including these results alongside other key service information in the Service Catalog has helped us fit Scorecards into our teams' existing workflows.

We also generate Scorecard reports that are sent to team Slack channels. Scorecard reports provide ongoing updates on how services are measuring up to expected standards, summarizing your highest- and lowest-scoring rules, services, and teams. These reports may be scoped to specific teams' services or cover every service defined in the Service Catalog.

By using Scorecards to evaluate our services throughout the software development life cycle, we've been able to:

  • Surface best practices early in the development process
  • Reduce unexpected setbacks and keep service launch timelines on track
  • Continuously assess the production-readiness of services as new features are developed

Rolling out Scorecards at scale

We've introduced Scorecards progressively in order to avoid overwhelming service owners and allow time for them to give us their feedback. As such, our rollout of Scorecards is ongoing: as a large organization, it will take us some time to reach all of our teams.

We know that adoption depends on a culture shift: we're not providing value if we're just adding to cognitive load. To promote adoption of Scorecards, we're careful to communicate the "why" of each rule and its outcomes. While ideally the value of our rules and the reasons for their passing or failing outcomes are self-evident, service owners have competing priorities, and it is incumbent on our SRE team to ensure that these details are clearly stated. Including detailed descriptions for our Scorecard rules is key to this.

Detailed Scorecard rule descriptions help service owners implement and understand established best practices.

Clarifying the reasoning behind each outcome and indicating actionable next steps in the remarks field of each rule is also key.

The remarks field of each Scorecard rule allows for a detailed explanation of its passing or failing outcome.

In our rule descriptions, we are particularly careful to emphasize the time savings that satisfying each rule can provide. For example, one rule recommending the adoption of a framework could save service owners the trouble of worrying about another set of rules entirely.

Some of our rules-such as those that improve security-are a base-level requirement for all of our services. But as we've already noted, when you have thousands of independent services, there can be no one-size-fits-all solutions. As such, we work with service owners to refine our rules and manage exceptions on an ongoing basis. Like any growing organization, we manage an evolving catalog of services, and this evolution necessitates evolving standards and processes for performance, reliability, observability, and security.

Measuring results

As our rollout of Scorecards continues, we're keeping an eye on a few metrics:

  • Number of revisions per service architecture RFC document. This number should decrease as Scorecards help us promote best practices for design early in the software development life cycle.
  • Time from starting PRR to service launch. This interval should diminish as Scorecards help us reduce the occurrence of unexpected roadblocks.
  • Amount of manual input required during PRR. We are gauging the human effort required before service launch with the goal of automating box-ticking as much as possible.
  • Length of time to set rule outcomes to passing. This will help us measure the difficulty of adopting improved libraries, frameworks, and best practices.

So far, all of these metrics are pointing in the right direction, and stakeholders across Datadog have attested to the effectiveness of Scorecards in communicating centralized, up-to-date guidance at scale as our systems evolve. Incorporating this guidance directly in the Service Catalog-which is already integral to our service owners' development workflows-has helped strengthen communication between (and save time for) many of our teams.

Optimizing our services and strengthening governance at scale

At Datadog, Scorecards have played a critical role in helping us establish and maintain important standards for security, reliability, and performance at scale. With the involvement of a growing number of teams, we are continuously evaluating our thousands of independent services according to both industry-defined best practices and fine-tuned custom rules for internal usage. This has helped us eliminate knowledge silos and provide our teams with robust, up-to-date guidelines throughout the software development life cycle. It has also helped SREs and company leadership assess our services' compliance with expected standards and best practices without relying on manually compiled reports. Meanwhile, as our rollout continues, SREs have been working closely with service owners and the Service Catalog team to help drive improvements to Scorecards for our customers.

You can learn more about Scorecards and how teams throughout your organization can customize their own elsewhere on our blog, and check out our documentation to get started. And if you're new to Datadog, you can sign up for a 14-day free trial.