12/02/2024 | News release | Distributed by Public on 12/02/2024 14:30
Managing LLM provider costs has become a chief concern for organizations building and deploying custom applications that consume services like OpenAI. These applications often rely on multiple backend LLM calls to handle a single initial prompt, leading to rapid token consumption-and consequently, rising costs. But shortening prompts or chunking documents to reduce token consumption can be difficult and introduce performance trade-offs, including an increased risk of hallucinations.
To maintain visibility into AI costs over time and find optimizations, AI engineers and FinOps personnel need ways to monitor AI cost both in terms of token consumption and dollars spent. Datadog Cloud Cost Management (CCM) and LLM Observability work together to provide granular insights into your LLM applications' token usage and cost, helping you track the total cost of ownership of your generative AI services.
CCM now lets you break down your real-not estimated-OpenAI spend from the project or organization level to individual models and their token consumption. And with LLM Observability, you can access a cost breakdown for every application in your environment, down to each individual LLM call in every prompt trace-all within a consolidated view of operational performance, model quality and safety, and application traces.
In this post, we'll explore how Cloud Cost Management and LLM Observability can help you understand the cost impact of your OpenAI services.
Datadog offers three different OpenAI integrations that provide cost insights, all of which can be monitored within the out-of-the-box OpenAI Cost Overview dashboard:
The OpenAI integration's free API component provides organization-level visibility into usage patterns, operational metrics, token consumption, and cost breakdowns across models and operations. This gives you a 10,000-foot view of your account's metrics, including input and output token consumption, as well as detailed costs per model, operation, and token.
The out-of-the-box dashboard available with this integration also provides high-level metrics from Cloud Cost Management and LLM Observability-we'll go into more detail about each of these next.
Cloud Cost Management stores all the pricing information for OpenAI models to provide accurate, up-to-date information about your spend. The Explorer view provides a detailed view of real daily costs for each of your active models and enables you to filter spend data by organization, project, model, service name, and other tags.
The aforementioned tags, as well as others, are available to use out of the box with native OpenAI support in CCM. By adding Tag Pipelines, you can add your own custom tags to support your specific configuration. For example, you might set up a Tag Pipeline to add the team tag, based on project ownership, to make it easier to identify which teams are spending the most on OpenAI. And by creating monitors for your CCM metrics and filters, you can set up timely alerts to inform your platform engineers and FinOps staff when budgetary overages occur.
CCM's granular cost metrics are also particularly useful when added to engineers' service health and performance dashboards. By putting OpenAI cost data in front of your engineers with dashboard widgets, you can encourage them to keep track of their AI spending and find ways to optimize.
With LLM Observability, users can investigate the root cause of issues, monitor operational performance, and evaluate the quality, privacy, and safety of LLM applications. LLM Observability shows cost data at various levels of granularity across its UI-from the entire application down to each trace and its constituent spans. The Applications view lets you inspect application-level cost metrics, breaking down costs by model, surfacing the most expensive span types, and graphing total costs over time. When you spot an expensive span kind or model that you want to investigate, you can immediately pivot to a filtered list of relevant spans using an embedded link.
You can see the input and output token count and cost figures for each trace in the Traces view, alongside other key metrics like duration and any triggered quality or safety checks. This enables you to quickly filter traces in the explorer to surface high-cost ones.
Each trace within LLM Observability contains spans providing a detailed breakdown of agent, tool, task, retrieval, and LLM call steps in the handling of the prompt. OpenAI request spans contain token count and cost figures, so you can break down the impact of each OpenAI call within a trace and find the culprits of unusually costly requests. This includes both user-entered prompts and system prompts formed on the backend for supplemental LLM calls. For example, the following screenshot shows cost data for an OpenAI request span submitting a system prompt to a chatbot.
By enabling you to troubleshoot high costs, errors, latency, security exposures, and model quality/safety issues all in the same workflow, LLM Observability offers an intuitive workflow for auditing the health, performance, cost, and security of your LLM applications. For more information about LLM Observability, see our blog post.
LLM provider spend is rapidly growing for many organizations that maintain AI-powered services. Through CCM, LLM Observability, and a comprehensive OpenAI integration, Datadog provides a myriad of ways for you to monitor your spend and prevent unexpected breaches of your budget.
For more information about CCM and LLM Observability, see the documentation for each of these features. If you haven't already, install the OpenAI integration to start tracking your OpenAI services. If you're brand new to Datadog, sign up for a free trial to get started.