Dynatrace Inc.

13/08/2024 | Press release | Distributed by Public on 14/08/2024 00:49

Dynatrace commitment to safe OneAgent releases: Protecting your production environment

Modern observability and security require comprehensive access to your hosts, processes, services, and applications to monitor system performance, conduct live debugging, and ensure application security protection. This level of access enables advanced capabilities such as runtime instrumentation and detailed diagnostics. While these techniques are powerful, they can pose risks if not managed properly, as demonstrated in the recent incident.

At Dynatrace, we've implemented a thorough and industry-proven approach to developing OneAgent® that minimizes such risks. Our approach encompasses all stages of the software development lifecycle, focusing on safeguarding OneAgent integration with your systems. Through rigorous testing, dependency management, continuous monitoring, and phased rollouts, we've prioritized the development of OneAgent with the highest possible reliability and security standards.

By adhering to these stringent processes, OneAgent is designed to operate smoothly and securely, minimizing the likelihood of disruptions and providing you with greater confidence in your system's security.

Dynatrace OneAgent: Quick overview

Dynatrace OneAgent is a unified monitoring solution deployed across your IT environment. It automatically discovers and monitors each host's applications, services, processes, and infrastructure components. Injecting monitoring code into your applications without manual configuration ensures continuous and comprehensive monitoring and security. OneAgent provides end-to-end visibility, capturing real-time performance data and detailed metrics on CPU, memory, disk, network, and processes.

Safety measures across the entire software lifecycle

From development to rollout and production, we've developed safeguards for each phase of the software development lifecycle to prevent problems in your systems during updates. Let's dive into the details.

Figure 1. Safeguards are implemented for each phase of the software development lifecycle

End-to-end safety measures in the software development process

  • Separation of concerns: Each OneAgent component is designed to perform only one specific function. Critical and impactful components are minimized and undergo regular detailed reviews. Changes are introduced on a controlled schedule, typically once a week, to reduce the risk of affecting customer systems.
  • Dependency reduction: OneAgent development teams minimize the use of third-party or open source dependencies. Any required dependencies are thoroughly tested and fixed to specific versions, minimizing the risk of introducing untested code.
  • Rigorous testing: Engineers conduct extensive unit and integration tests on all code changes, covering individual functions and OneAgent performance with real-world applications. These tests are run on all supported operating systems and versions to enhance reliability.
  • Hardening phase: Before release, OneAgent undergoes a month-long hardening phase, during which repeated tests are conducted to uncover hidden issues. Our developers double-check these tests daily. The software is also deployed to internal test applications for real-world validation.
  • Artifact signing: OneAgent binaries are signed to prevent unauthorized changes. You can verify the signature during installation and for every update, ensuring code integrity.

Reduced likelihood of failures in the OneAgent rollout process

  • Pre-rollout check: After the hardening phase, we thoroughly review each new OneAgent version with all teams to identify any known issues or concerns. We only proceed with a phased rollout when confident in a release.
  • Phased and controlled rollout: Each rollout is carefully staged, starting with internal environments, then moving to proofs-of-concept (POCs), trials, new customer environments, and finally to the broader customer base. We monitor OneAgent performance at each stage to ensure it behaves as expected.
  • Customer control: You can manually update OneAgent versions, prioritizing updates for critical applications or host groups while enabling auto-updates for less critical areas. You can also schedule updates during maintenance windows to minimize disruption.

Stay ahead of potential failures with production self-monitoring

  • 24/7 fully automated monitoring: We continuously monitor critical statistics, such as the number of connected OneAgents per technology, to swiftly address any significant issues. We also collect and analyze warning and severe log events to proactively address potential problems before they escalate into incidents.
  • Real-time health insights: Dynatrace provides immediate insights into your environment, helping you quickly identify the root cause of any issues. This enables you to clearly understand the health of your OneAgent deployment across your entire environment.
Figure 2. Stay ahead of OneAgent deployment issues with self-monitoring during production.
  • Automated issue collection: Details of potential issues are automatically collected and can be sent to Dynatrace for further analysis. Additionally, manual diagnostics can be performed to gather more specific information if needed.
Figure 3. Manual diagnostics reveal a critical error in an outdated version of OneAgent.
  • Fully automated analysis: When a problem is reported, regardless of its severity, we assess its potential impact on our customers' environments and take appropriate action as needed. This might include adding safety checks or rolling out an updated version to address the issue.

Ensure system stability with our zero-impact policy

Dynatrace OneAgent is built with a focus on reliability and security, aiming to keep your systems stable and protected. We proactively approach potential risks, implementing rigorous standards across the entire software development lifecycle and maintaining continuous, real-time monitoring. This comprehensive strategy is designed to reduce the likelihood of disruptions, offering you greater confidence in the safety and stability of your IT environment.

While no system is entirely free from potential risks, our approach minimizes such risks and provides robust protection for your systems, helping to ensure smoother and more reliable operation.

For complete details about Dynatrace OneAgent, go to Dynatrace Documentation.

Gain insights into how Dynatrace developers develop observability code, continuously test across all supported environments, and avoid situations like the recent CrowdStrike outage.