Composing OpenTelemetry Reference Architectures

Most OpenTelemetry tutorials end at the same place: an application instrumented with the SDK, exporting traces to a single collector, forwarding to a backend. It works. Then production happens.

Traffic grows. Teams want metrics derived from traces. The backend goes down for maintenance and you lose an hour of telemetry. A compliance requirement means PII must be stripped before data leaves the cluster. Suddenly, that single collector isn't enough — and the question becomes: what should the architecture actually look like?

The OpenTelemetry Collector is designed to be composed. It can run in multiple deployment modes, be chained into pipelines, and scaled independently at each stage. But the documentation describes individual components, not how to think about assembling them. That thinking is what this article is about.

What follows is a conceptual framework for reasoning about collector architectures — not a set of rigid templates. The building blocks described here are reference points. In practice, they combine, overlap, and adapt to your constraints. A tail sampling tier might also need Kafka-backed resilience. A gateway might absorb the role of a sampling tier at low volumes. The goal is to understand the concepts well enough to compose the right architecture for your situation, not to pick a pre-built one off a shelf.

Three conceptual layers

It helps to think about collector architectures in three layers: edge, processing, and resilience. These aren't physical tiers that must exist as separate deployments — they're categories of concern. A single collector can address multiple layers. A complex deployment might have several components within one layer. The layers are a thinking tool, not a deployment diagram.

Edge: how telemetry enters the pipeline

The edge layer is about the first hop — how telemetry gets from your applications and infrastructure into the pipeline. At this stage, the collector gathers data in two fundamentally different ways. Pull-based receivers like filelog and hostmetrics actively reach out to collect data — tailing log files on disk or scraping system-level metrics from the host. Push-based receivers like otlp listen for data sent to them — applications instrumented with OpenTelemetry SDKs export traces, metrics, and logs directly to the collector's OTLP endpoint. A single edge collector typically runs both: pull receivers for infrastructure telemetry the application doesn't know about, and push receivers for application telemetry the SDK produces. There are several common deployment patterns, and the right one depends on your environment and what you need to collect.

DaemonSet Agent — One OpenTelemetry Collector per Kubernetes node, deployed as a DaemonSet. Applications export to the agent running on the same node (typically via status.hostIP:4317 using the Kubernetes Downward API). The agent also tails container log files from disk via the filelog receiver and scrapes host-level metrics via the hostmetrics receiver. This is the most common Kubernetes pattern because it handles both application and infrastructure telemetry with a single deployment, and applications only need to know about localhost.

Sidecar Agent — One OpenTelemetry Collector per pod, deployed as a sidecar container. Each service gets its own collector with a custom configuration. This is required on managed container platforms like AWS Fargate or Azure Container Apps where DaemonSets aren't available, and it's useful when services have different processing requirements. When running alongside a DaemonSet, the sidecar handles application telemetry while the DaemonSet independently collects node-level telemetry — applications don't send to both.

Host Agent — A standalone OpenTelemetry Collector running as a systemd service on bare-metal or VM hosts. It serves the same role as the DaemonSet agent but outside Kubernetes: collecting host metrics, tailing log files, and receiving OTLP from local applications.

Direct SDK Export — Applications export directly to the next stage (gateway or backend) with no local collector. This is the simplest option but only works when you don't need infrastructure collection. For log collection, the recommended pattern is still to write to stdout and use a collector with the filelog receiver — even if the SDK is exporting traces and metrics directly.

These patterns aren't mutually exclusive. A Kubernetes cluster might run DaemonSet agents for infrastructure collection alongside sidecars for services that need custom processing. A VM environment might use host agents for some services and direct SDK export for others. The edge layer is about matching the collection pattern to the workload, not picking one pattern for everything.

Processing: central policy, sampling, and transformation

Not every architecture needs a processing layer. If your edge collectors can export directly to your backend and you don't need centralized policy, you can skip it to favour simplicity. But several scenarios push you toward central processing — and the way you address them can range from a single gateway to a multi-stage pipeline.

Centralized policy (Gateway) — A pool of OpenTelemetry Collectors that sits between edge collectors and the backend. This is where you enforce consistent filtering, transformation, and PII redaction across all services. It's also where you manage backend credentials — edge collectors export to the gateway over OTLP, and only the gateway holds the API keys. Credential isolation is often the primary reason teams add a gateway.

Replica count scales with data volume. At low volumes (under 1K events/sec), 2 replicas co-located with workloads is sufficient. At medium volumes, 3–5 replicas on a dedicated node pool. At high volumes, 5–20+ replicas, potentially in a separate cluster. This is a general rule of thumb, and you should adapt it to your specific needs as loads might vary significantly between payload types.

Tail-based sampling — Sampling decisions that consider the complete trace (e.g., "keep all traces with errors, sample 10% of successful traces") require that all spans of a trace reach the same collector instance. This is achieved with the loadbalancingexporter using routing_key: traceID, which consistently routes spans from the same trace to the same downstream collector.

There's a critical subtlety here: if you're deriving span metrics (RED metrics) from traces using the spanmetrics connector, the derivation must happen before sampling. Otherwise, your metrics only reflect the sampled subset, not the true traffic. The correct pattern is a two-step pipeline within the sampling stage:

Receive traces, derive spanmetrics from 100% of traffic, forward via a forward connector.
Apply tail_sampling to the forwarded traces, export only kept traces.
A separate metrics pipeline exports the derived RED metrics.

This ensures accurate metrics regardless of your sampling rate.

The key point about processing is that these capabilities — gateway policy, tail sampling, span metrics derivation — are not separate products or fixed modules. They're configurations of the same OpenTelemetry Collector. At low volumes, a single gateway deployment might handle policy enforcement, sampling, and metrics derivation all at once. At high volumes, you might split them into dedicated stages for independent scaling. The architecture adapts to your scale, not the other way around.

Resilience: what happens when the backend is down

The resilience layer determines how much data you're willing to lose during backend outages or collector restarts. This isn't a separate tier you bolt on — it's a property you apply to any stage of the pipeline.

In-Memory Queues — The default. The collector's sending_queue retries failed exports with exponential backoff. If the collector process crashes or restarts, queued data is lost. This is acceptable for development and for workloads where some data loss during incidents is tolerable.

Persistent Queues (WAL) — The file_storage extension writes queued data to disk before export. If the collector crashes, it resumes from where it left off after restart. In Kubernetes, this requires a PersistentVolumeClaim. This is the right choice for most production workloads — it survives collector restarts and brief backend outages without the operational complexity of an external message bus.

Kafka Buffer — An external Kafka cluster sits between collectors and the backend. Producer collectors write to Kafka topics; consumer collectors read from Kafka and export to the backend. This provides the strongest durability guarantee — Kafka can buffer hours of telemetry during extended outages and enables replay. But it adds significant operational complexity.

The important thing to understand is that resilience is orthogonal to the other layers. You can add persistent queues to an edge agent, a gateway, or a sampling tier. You can put Kafka in front of a gateway, in front of a sampling tier, or in front of the backend. A tail sampling deployment that needs to survive extended outages might use Kafka-backed ingestion — combining what might look like two separate "modules" into a single stage. The building blocks compose freely based on what you need to protect against.

Where to start with your architecture

The Agent + Gateway two-tier pattern is the de facto production standard, used by the vast majority of organizations running OpenTelemetry at scale. DaemonSet agents on every node handle local collection — pulling infrastructure telemetry via filelog and hostmetrics, receiving application telemetry via OTLP — while a centralized gateway pool enforces policy, manages credentials, and exports to the backend. Persistent queues (WAL) on the gateway protect against backend outages without external dependencies.

Every other configuration either simplifies this pattern or extends it. Smaller environments might drop the gateway and export directly from agents. Larger ones might add a tail sampling tier with traceID-based load balancing, a Kafka buffer for extended resilience, or span metrics derivation before sampling. The building blocks described in the previous sections — edge, processing, resilience — are the modules you add or remove from this foundation.

The key is to start with the two-tier pattern and evolve incrementally:

Need credential isolation or centralized PII redaction? You already have the gateway.
Need tail-based sampling? Add a load-balancing exporter and a sampling tier between agents and gateway.
Need hours of buffer during extended outages? Insert Kafka between agents and the processing tier.
Running on Fargate or Azure Container Apps? Swap DaemonSet agents for sidecars — the rest of the pipeline stays the same.

Start here. Add modules as your needs grow. The architecture adapts to your scale, not the other way around.

Decision points that shape where you need to take your architecture

When designing a collector architecture, these are the questions that determine which patterns you need:

Question	Impact
Do I need infrastructure telemetry (host metrics, disk logs)?	Determines whether you need a local collector or can use direct SDK export
Am I on a managed container platform (Fargate, ACA)?	Forces sidecar pattern instead of DaemonSet
Do I need centralized filtering, PII redaction, or credential isolation?	Adds a gateway stage
Do I need tail-based sampling?	Adds a sampling stage with load-balancing exporter and traceID routing
Do I want span-derived metrics (RED metrics)?	Requires spanmetrics before sampling in a two-step pipeline
How much data loss is acceptable during outages?	Determines in-memory queues vs. persistent queues vs. Kafka — applied to whichever stage needs protection
What is my expected data volume?	Determines whether capabilities can be co-located in a single deployment or need dedicated stages

The answers to these questions don't map to a single "correct" architecture. They constrain the design space, and within those constraints, you make trade-offs between simplicity and capability.

Exploring these patterns interactively

If you'd rather explore how these building blocks compose than assemble them by hand, OpenTelemetry Blueprints is an open-source tool that generates reference architectures from your requirements.

Toggle your environment, signals, volume, resilience, and processing needs — and get a composed diagram with animated data flow, interactive tooltips, and reference collector configurations you can open directly in OTelBin for validation.

The generated configurations export via OTLP, so they work with any OTLP-compatible backend — including Elastic Observability, which natively accepts and stores OTLP traces, metrics, and logs.

The architectures Blueprints generates are reference compositions — starting points for understanding how the building blocks fit together, not turnkey deployments. Every architecture should be adapted to your organisation's scale, security, networking, and compliance requirements. The patterns might combine or overlap differently in your environment than in anyone else's, and that's the point.

Get started

The architectures described here export over OTLP, so they work with any compatible backend. If you don't have one yet, the fastest way to see your telemetry flowing end-to-end is with Elastic Observability — it natively ingests OTLP traces, metrics, and logs with no additional configuration.

Start a free trial on Elastic Cloud Serverless — no credit card required.
Point your collector's OTLP exporter at the managed OTLP endpoint.
Explore your traces, metrics, and logs in Kibana within minutes.

Check out these resources to go further: