10/29/2024 | Press release | Distributed by Public on 10/29/2024 11:11
The Dynatrace Distributed Tracing app redefines how teams work with OpenTelemetry data. By combining OTel's comprehensive data collection with the Dynatrace platform, you gain unparalleled visibility into your application's behavior. The user-friendly interface simplifies the complexity of distributed traces, allowing you to pinpoint and resolve performance issues quickly. With out-of-the-box contextual analysis and the flexibility to dive deep into your data, Dynatrace empowers you to maximize the value of your OpenTelemetry implementation.
In a recent blog post, we announced and demonstrated how the new Distributed Tracing app provides effortless trace insights. In this blog post, we'll walk you through a hands-on demo that showcases how the Distributed Tracing app transforms raw OpenTelemetry data into actionable insights
To run this demo yourself, you'll need the following:
To set up the token, see Dynatrace API - Tokens and authentication in Dynatrace documentation.
Once your Kubernetes cluster is up and running, the first step is to create a secret containing the Dynatrace API token. This will be used by the OpenTelemetry collector to send data to your Dynatrace tenant. The secret can be created using the following command:
API_TOKEN="" DT_ENDPOINT=https://.dynatrace.com/api/v2/otlp kubectl create secret generic dynatrace --from-literal=API_TOKEN=${API_TOKEN} --from-literal=DT_ENDPOINT=${DT_ENDPOINT}
After successfully creating the secret, the OpenTelemetry demo application can be installed using Helm. First, download the helm values file from the Dynatrace snippets repo on GitHub.
This file configures the collector to send data to Dynatrace using the API token in the secret you created earlier. Then, use the following commands to install the Demo application on your cluster:
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm install my-otel-demo open-telemetry/opentelemetry-demo --values otel-demo-helm-values.yaml
After invoking the helm install command, the application will eventually be up and running, and the OpenTelemetry collector will send data to your Dynatrace tenant.
In your Dynatrace tenant, navigate to Dashboards.
On the Dashboards page, you can import a JSON file containing the dashboard configuration using the Upload button. To install the OpenTelemetry Demo application dashboard, upload the JSON file. The file can be downloaded here.
Once the dashboard is imported, you'll see several charts representing the application's overall health.
The Service Level Monitoring section contains the following charts:
These charts give you a quick overview of the overall application health, allowing you to quickly identify any services currently not behaving as expected. In combination with the time series charts, this will aid you in determining the point in time at which a service started to cause problems.
In addition to service-level monitoring, certain services within the OpenTelemetry demo application expose process-level metrics, such as CPU and memory consumption, number of threads, or heap size for services written in different languages.
Note that the developers of the respective services need to make these metrics available by exposing them via, for example, a Prometheus endpoint that can be used by the OpenTelemetry collector to ingest them and forward them to your Dynatrace tenant. Once the data is available in Dynatrace, DQL makes it easy to retrieve and visualize it on a dashboard.
Now, we'll see how the dashboard can help you spot problems and find their root cause. For this purpose, we'll use the in-built failure scenarios included in the OpenTelemetry demo. To enable failure scenarios, we need to update the my-otel-demo-flagd-config ConfigMap containing the feature flags of the application. The feature flags defined here contain the productCatalogFailure flag, for which you need to change the defaultVariant from off to on. After a couple of minutes, the effects of this change will be noticeable in the service level metrics as the failed spans start to increase:
Also, in the Errored Spans with Logs table, you'll notice a lot of entries that seem to be related to the retrieval of products, as indicated in the related log messages. Since all requests the load generator generates go through the frontend service, most logs related to failed spans are generated here. To pinpoint exactly where those requests are failing, use the trace.id field that is included in each table entry. Select a value within this column to go to the related distributed trace in the Dynatrace web UI.
Within the Distributed traces view, you get an overview of which services are involved in the errored trace and which of the child spans of the trace caused errors.
Here, notice that the error seems to be caused by the product service, particularly instances of the GetProduct call. Select the failed span to go to a detailed overview of the failed GetProduct request, including all attributes attached to the span, as well as a status description.
Here, you see that the status message indicates that the failures that occur are related to the feature flag we changed earlier. However, not all GetProduct spans are failing; only some are. Therefore, we need to investigate further by adding a specialized tile to our dashboard to evaluate whether the product ID impacts the error rate. For this, we use the following DQL query, which fetches all spans generated by the product service with the name oteldemo.ProductCatalogService/GetProduct, and summarizes the number of errored spans by the product ID.
This query confirms the suspicion that a particular product might be wrong. All the errors seem to be caused by requests for a specific product ID or a faulty entry in the product database.
Of course, this example is somewhat easy to troubleshoot as it's based on a built-in failure scenario. Still, it should give you an impression of how DQL enables you to investigate problems by analyzing how specific attributes attached to spans might affect the outcome of requests sent to a faulty service.
In this blog post, we explored how the Distributed Tracing app can be harnessed to visualize data ingested from the OpenTelemetry collector to get an overview of application health. This end-to-end tracing solution empowers you to swiftly and efficiently identify the root causes of issues. Enjoy unprecedented freedom in data exploration, ask questions, and receive tailored answers that precisely meet your needs.
This powerful synergy between OpenTelemetry and the Dynatrace platform creates a comprehensive ecosystem that enhances monitoring and troubleshooting capabilities for complex distributed systems, offering a robust solution for modern observability needs.
If you're new to Dynatrace and want to try out the Distributed Tracing app, check out our free trial.
We're rolling out this new functionality to all existing Dynatrace Platform Subscription (DPS) customers. As soon as the new Distributed Tracing Experience is available for your environment, you'll see a teaser banner in your classic Distributed Traces app.
If you're not yet a DPS customer, you can try out this functionality in the Dynatrace playground instead. You can even walk through the same example above.
If you're interested in learning more about the Dynatrace OTel Collector and its use cases, see the documentation.
This is just the beginning. So, stay tuned for more enhancements and features.
Make your voice heard after you've tried out this new experience. Provide feedback for Distributed Tracing in the Distributed Tracing feedback channel (Dynatrace Community)