Datadog Inc.

11/27/2024 | News release | Distributed by Public on 11/27/2024 13:48

Orchestrion: Compile time auto instrumentation for Go

feature / apm / go

For the past couple of years, we at Datadog have been putting a lot of effort into automating the work of instrumenting your applications for APM. We've done that by using runtime specific instrumentation techniques that allow SREs to enable distributed tracing through configuration or environment variables without modifying the original source code. This simplifies the tracing setup and often completely eliminates the need for manual instrumentation.

Unfortunately, not all runtimes provide such capabilities. Go-one of the most popular languages for our users-compiles into a native binary, which makes it difficult to inject instrumentation at runtime. Therefore, users historically needed to spend a lot of development time manually instrumenting their Go applications for APM.

That's why we created Orchestrion, a new tool that processes Go source code at compilation time and automatically inserts instrumentation to produce Datadog APM traces. This also enables support for Datadog Application Security Management Exploit Prevention to self-protect against common vulnerabilities.

In this post, we'll cover how Orchestrion and compile-time instrumentation work along with alternative approaches we considered, as well as offer a quick guide on getting started.

An introduction to Orchestrion

Orchestrion interfaces with the standard Go toolchain to inspect and modify the source code as it's being sent to the compiler. Manipulating the code at the Abstract Syntax Tree (AST) level means all changes done to the program are verified and type-checked by the Go compiler in the same way as any handwritten Go code. This allows Orchestrion unrestricted access to all behavior of the application-down to the standard library-while preventing a large class of errors that could have resulted from direct modification of the compiled binary. Since all code goes through the normal Go compiler, modifications are also not impaired by certain compiler optimizations (such as inlining), and the modified code goes through all usual compiler optimization passes, resulting in reduced runtime overhead.

Orchestrion also inserts Go //line pragma directives in the modified source code so that line numbering is not impacted by modifications made and stack traces produced by instrumented applications point to the correct location in the original source code.

Orchestrion is built on a framework inspired by Aspect-oriented Programming (AoP), where code modifications are specified by pairing a join point-selecting which parts of the AST are to be modified-describing the modifications to be made. This makes it easy to write new integrations for Orchestrion, and code-level modifications are a lot easier to reason about than binary-level instrumentation.

Why we chose compile-time instrumentation

Before choosing compile-time instrumentation as our new approach, we considered two alternative techniques that are being used in the industry: binary patching and eBPF.

We define binary patching as a set of techniques that involve modifying the machine code and memory of a compiled application in order to inject instrumentation code and propagate trace and span IDs. For eBPF, we're referring to the approach of accomplishing the same thing using uprobes and eBPF programs that write to user space memory.

Our research revealed different strengths and weaknesses for each technique:

Binary patching eBPF Orchestrion
Safety, Reliability, Data Quality Fail Marginal Pass
Automation Pass Pass Marginal
Performance Overhead Pass Marginal Pass
Supported Environments Marginal Fail Pass
Capabilities Marginal Fail Pass

While the table above offers a simplified overview, the reality of comparing the different techniques is very complex and relies on a lot of assumptions. We could dedicate several articles to this topic, but for now we'll try to cover the most important aspects we considered.

Safety, reliability, and data quality

When it comes to safety, reliability, and data quality, we focused on the risks of harming the instrumented application or producing incorrect or missing data. Go uses an optimizing compiler that produces binaries containing a scheduler, a garbage collector, and various built-in data structures. Binary patching requires careful reverse engineering of these components in order to hook into the execution of different functions within the application. Small mistakes can easily produce the wrong data, crash the application, or even corrupt data. Given the complexity and constant evolution of the compiler, runtime, and targeted libraries, we assigned a moderate probability for such issues to arise over time in practice. eBPF reduces these risks by relying on the uprobe kernel mechanism for hooking into function execution as well as executing most of the instrumentation code in a safe virtual machine inside of the kernel. However, uprobes still carry a small risk of crashing the application. And perhaps more importantly, eBPF still requires writing to user memory in order to propagate trace and span IDs, exposing it to the same data corruption risks as binary patching.

Level of automation

One compelling strength of binary patching and eBPF is the level of automation they offer. For both approaches, it is sufficient to deploy a single Agent on the host system in order to instrument all deployed applications. Orchestrion requires a small change to the build process and a redeployment of the application itself, making it slightly less automated.

Performance overhead

For performance overhead, eBPF falls slightly behind because the firing of uprobes requires context switching between user space and the kernel, which can be prohibitive for hot code paths. We're aware of efforts to overcome this by implementing eBPF in user space; such approaches would match the performance of binary patching, but also come with the associated risks.

Supported environments

eBPF is generally limited to Linux environments where elevated privileges are available, which rules out serverless environments such as AWS Lambda and Fargate. Additionally, both eBPF and binary patching require architecture specific implementations. This often makes it commercially unviable to support environments other than amd64 and arm64.

Overall capabilities

Last but not least, we consider eBPF restrictive in terms of overall capabilities because the uprobe mechanism does not allow us to block function calls in order to protect the security of the instrumented application. Binary patching is in theory unlimited when it comes to capabilities, but in practice their implementation comes with increased risks due to the complex interactions with the Go runtime as well as the fact that the additional logic executes in user space where it might crash the application.

Ultimately, we had to choose between the level of automation and the associated risks for our customers. Our philosophy is that safety and reliability should always come first, which is why we created Orchestrion. However, we will continue to evaluate alternative approaches as they develop and mature.

Orchestrion for security

Code-level operations allow Orchestrion to inject instrumentation that can alter the control flow of the program at decisive points, which makes it possible to implement Runtime Application Self-Protection (RASP) features allowing applications to self-protect against common vulnerabilities such as SQL injection or local file inclusion (both OWASP Top-10 entries). Such features cannot be built with eBPF-based solutions, as these are limited to observing the application.

The ability to entirely substitute a particular API with another also means developers are no longer required to think about passing a context. Context value through all the layers of their business logic solely for the purpose of allowing trace context chaining: this can be done transparently for them at compilation time.

Getting started with Orchestrion

Run the following command to install and set up Orchestrion:

Copy
go install github.com/DataDog/orchestrion@latest

Note: Ensure $(go env GOBIN) or $((go env GOPATH)/bin) is in your $PATH.

Then, register Orchestrion in your project's go.mod:

Copy
orchestrion pin

Commit changes to your version control system:

Copy
git commit -m "chore: enable orchestrion"go.mod go.sum orchestrion.tool.go

Next, use one of the following two methods to enable Orchestrion in your build process:

Prepend Orchestrion to your usual go commands:

Copy
orchestrion go build .
orchestrion go run .
orchestrion go test./...

Modify the $GOFLAGS environment variable to inject Orchestrion, and use go commands normally:

Copy
# Make sure to include the quotes as shown below, as these are required for # the Go toolchain to parse GOFLAGS properly! export GOFLAGS="${GOFLAGS}'-toolexec=orchestrion toolexec'" go build .
go run .
go test./...

Instrument your Go applications today

Orchestrion simplifies your monitoring by instrumenting your Go applications at build time, enabling you to quickly get started with Datadog APM. For more information on this new tool, visit our documentation. If you're new to Datadog, get started with a 14-day free trial.