Constanca Manteigas

Scale testing OpenTelemetry log ingestion on GCP with EDOT Cloud Forwarder

Learn how we load tested the EDOT Cloud Forwarder for GCP on Google Cloud Run and identified practical capacity limits per instance. We show how runtime tuning improves stability and translate the results into concrete configuration and scaling guidance.

Scale testing OpenTelemetry log ingestion on GCP with EDOT Cloud Forwarder

EDOT Cloud Forwarder (ECF) for GCP is an event-triggered, serverless OpenTelemetry Collector deployment for Google Cloud. It runs the OpenTelemetry Collector on Cloud Run, ingests events from Pub/Sub and Google Cloud Storage, parses Google Cloud service logs into OpenTelemetry semantic conventions, and forwards the resulting OTLP data to Elastic, relying on Cloud Run for scaling, execution, and infrastructure lifecycle management.

To run ECF for GCP confidently at scale, you need to understand its capacity characteristics and sizing behavior. For ECF for GCP which is part of the broader ECF architecture, we answered those questions through repeatable load testing and by grounding decisions in measured data.

We'll introduce the test setup, explain each runtime setting, and share the capacity numbers we observed for a single instance.

How we load tested EDOT Cloud Forwarder for GCP

Architecture

The load testing architecture simulates a realistic, high-volume pipeline:

  1. We developed a load tester service that uploads generated log files to a GCS bucket as fast as possible.
  2. Each file creation in this Google Cloud Storage (GCS) bucket then triggers an event notification to Pub/Sub.
  3. Pub/Sub delivers push messages to a Cloud Run service where EDOT Cloud Forwarder fetches and processes these log files.

Our setup exposes two primary tunable settings that directly influence Cloud Run scaling behavior and memory pressure:

  • Request pressure using a concurrency setting (how many concurrent requests each ECF instance can handle).
  • Work per request using a log count setting (number of logs per file in each uploaded object).

In our tests, we used a testing system that:

  • Deploys the whole testing infrastructure. This includes the complete ECF infrastructure, a mock backend, etc.
  • Generates log files according to the configured log counts, using a Cloud Audit log of ~1.4 KB.
  • Runs a matrix of tests across all combinations of concurrency and log volume.
  • Produces a report for each tested concurrency level in which several stats are reported, such as CPU usage and memory consumption.

For reproducibility and isolation, the

otlphttp
exporter in EDOT Cloud Forwarder uses a mock backend that always returns HTTP 200. This ensures all observed behavior is attributable to ECF itself, not downstream systems or network variability.

Step 1: Establish a stable runtime before measuring capacity

Before asking how much load a single instance can handle, we first established a stable runtime baseline.

We quickly learned that a single flag,

cpu_idle
, can turn Cloud Run into a garbage-collector (GC) starvation trap. This is amplified by a known limitation of our ECF current architecture: the existing OpenTelemetry implementation reads whole log files into memory before processing them. Our goal was to eliminate configuration side effects so capacity tests reflected ECF actual limits.

We focused on three runtime parameters:

SettingWhat it controlsWhy it matters for ECF
cpu_idle
Whether CPU is always allocated or only during requestsDictates how much background time the garbage collector gets to reclaim memory
GOMEMLIMIT
Upper bound on Go heap size inside the containerKeeps the process from quietly growing until Cloud Run kills it on OOM
GOGC
Heap growth and collection aggressiveness in GoTrades lower memory usage for higher CPU consumption

All parameter-isolation tests use a single Cloud Run instance (min 0, max 1), fix concurrency for the scenario under study, and keep input files and test matrix identical across runs. This design lets us attribute differences directly to the parameter in question.

CPU allocation: Stop starving the garbage collector

Cloud Run offers two CPU allocation modes:

  • Request-based (throttled). Enabled with
    cpu_idle: true
    . CPU is available only while a request is actively being processed.
  • Instance-based (always on). Enabled with
    cpu_idle: false
    . CPU remains available when idle, allowing background work such as garbage collection to run.

The tests compared these modes under identical conditions:

ParameterValue
vCPU1
Memory4 GiB (high enough to remove OOM as a factor)
GOMEMLIMIT
90% of memory
GOGC
Default (unset)
Concurrency10

What we observed

With CPU allocated only on requests (

cpu_idle: true
):

  • Memory variance was extreme (±71% RSS, ±213% heap).
  • Peak heap reached ~304 MB in the worst run.
  • We saw request refusals in the sample (90% success rate).

With CPU always allocated (

cpu_idle: false
):

  • Memory variance became tightly bounded (±8% RSS, ±32% heap).
  • Peak heap dropped to ~89 MB in the worst run.
  • We saw no refusals in the sample (100% success).

From these runs we saw:

  • When CPU is throttled, the Go garbage collector is effectively starved, leading to heap accumulation and large run-to-run variance.
  • When CPU is always available, garbage collection keeps pace with allocation, resulting in lower and more predictable memory usage.

Takeaway: for this set of tests,

cpu_idle: false
was the most stable baseline configuration. Request-based CPU throttling introduced artificial instability that makes capacity planning much harder.

Go memory limit:
GOMEMLIMIT
in constrained containers

Cloud Run enforces a hard memory limit at the container level. If the process exceeds it, the instance is OOM-killed.

We tested Cloud Run with:

ParameterValue
Container memory512 MiB
vCPU1
Concurrency20
GOGC
Default (unset)
cpu_idle
false

The tests compared:

  • No
    GOMEMLIMIT
    (Go relies on OS pressure).
  • GOMEMLIMIT=460MiB
    (or 90% of container memory).

The results were clear:

GOMEMLIMIT
OutcomeNotes
UnsetUnstable; repeated OOM killsService never produced stable results
460MiB
Stable; runs completedWorst-case peak RSS reached ~505 MB, but the process within container limits

Takeaway: in a memory-constrained environment like Cloud Run, setting

GOMEMLIMIT
close to (but below) the container limit is essential for predictable behavior under load.

GOGC: memory savings vs. reliability

The

GOGC
parameter controls how much the heap can grow (in %) between GC cycles:

  • Lower values (e.g.,
    GOGC=50
    ): more frequent collections, lower memory, higher CPU.
  • Higher values (e.g.,
    GOGC=100
    ): fewer collections, higher memory, lower CPU.

The tests covered: (1)

GOGC=50
(aggressive); (2)
GOGC=75
(moderate); (3)
GOGC=100
(default/unset).

Setup:

ParameterValue
Container memory4 GiB (high enough to remove OOM as a factor)
vCPU1
Concurrency10 (safe level)
GOMEMLIMIT
90% of memory
cpu_idle
false

What we observed

From the runs:

GOGC
Peak RSS (sample)CPU behaviorFailure rateNotes
50~267 MBVery high; often saturating30%GC consumed cycles needed for ingestion
75~454 MB~83.5% avg10%GC consumed cycles needed for ingestion
100 (default)~472 MB~83.5% avg; leaves headroom for bursts0%

The conclusion from these runs is clear: pushing

GOGC
down trades memory for reliability, and the trade is not favorable for ECF.

Takeaway: for this workload, the default

GOGC=100
provided the best balance. Attempts to optimize memory by lowering
GOGC
directly reduced reliability.

Step 2: Find capacity and breaking points

With the runtime stabilized, we evaluated how much traffic a single instance can sustain by increasing concurrency until failures emerged.

How to read the tables: each concurrency level was tested across 20 runs covering both light (240 logs per file, around 362KB file size) and heavy inputs (over 6k logs per file, around 8MB file size). Tables report baseline RSS from light workloads and peak values from the worst-case run.

Concurrency 5: Stable baseline

At concurrency 5, the service was solid.

CaseMemory (RSS)CPU utilizationRequests refused
Baseline (lightest workload avg)99.89 MB
Worst run211.02 MB86.43%No

This proved that a single instance handles a moderate load comfortably, with memory usage staying well within safe limits.

Concurrency 10: Safe but volatile

At concurrency 10, the system remained functional but with significant volatility.

CaseMemory (RSS)CPU utilizationRequests refused
Baseline (lightest workload avg)100.33 MB
Worst run424.80 MB94.10%No (in sample)

We also noticed that memory usage shows extreme variance:

  • Best run RSS: 178 MB.
  • Worst run RSS: 425 MB.

This behavior comes mainly from two effects:

  • Bursty Pub/Sub delivery: 10 heavy requests may land at nearly the same instant.
  • The use of
    io.ReadAll
    inside the collector: each request reads the entire log file into memory.

When all 10 requests arrived concurrently, we were effectively stacking ~10× file size in RAM before the GC can clean up. When they are slightly staggered, GC has time to reclaim memory between requests, leading to much lower peaks.

This leads to a crucial sizing insight:

  • Do not size the service using average memory (for example, ~260 MB).
  • Size it for the worst observed burst (~425 MB) to avoid OOM or GC stalls.

In practice, you should set the memory limit to at least 512 MiB per instance at concurrency 10.

Concurrency 20: Unstable, systemic load shedding

At concurrency 20, the system consistently began shedding load.

CaseMemory (RSS)CPU utilizationRequests refused
Baseline (lightest workload avg)97.44 MB
Worst run482.42 MB88.90%Yes (every run)

Even though memory and CPU metrics don't look drastically worse than at concurrency 10, behavior changes qualitatively: the service begins to refuse requests consistently.

Concurrency 40: Failure mode

At concurrency 40, the instance collapsed completely. Memory and CPU are overwhelmed, and ingest reliability collapses.

CaseMemory (RSS)CPU utilizationRequests refused
Baseline (lightest workload avg)100.20 MB
Worst run1234.28 MB96.57%Yes (all runs)

The breaking point: a 1 vCPU instance's realistic limits

ConcurrencyPeak RSS (MB)StabilityRefusals?Status
5211.02Low varianceNoStable baseline
10424.80High varianceNoSafe but volatile
20482.42High varianceYes (Frequent)Unstable (sheds load)
401234.28Extreme varianceYes (Always)Failure (memory explosion)

Combined with the CPU data (94% peak at concurrency 10), this supports a practical rule: for this workload and architecture, 10 concurrent heavy requests per 1 vCPU instance is the realistic upper bound.

Turning findings into concrete recommendations

These experiments lead to clear, actionable recommendations for running the ECF OpenTelemetry collector on Cloud Run as part of the broader Elastic Cloud Forwarder deployment.

Scope: these recommendations apply to the workload and harness we tested (light vs. heavy log files up to 8MB, and Pub/Sub burst delivery), using the tuned runtime settings listed below. If your log sizes, request burstiness, or pipeline shape differ significantly, validate these limits against your own traffic.

Runtime and container configuration

AreaRecommendationRationale
CPU allocationSet
cpu_idle: false
(always-on CPU)
Avoids GC starvation, stabilizes memory variance, and eliminates request failures caused by long GC pauses
Go memory limitSet
GOMEMLIMIT
to ~90% of container memory
Enforces a heap boundary aligned with the Cloud Run limit so that Go reacts before the OS, preventing OOM kills
Garbage collectionKeep
GOGC
at 100 (default)
Lower
GOGC
reduces memory at the cost of higher CPU usage and measurable failure rates

Capacity and per-instance limits

For a 1 vCPU Cloud Run instance running the ECF OpenTelemetry collector with the tuned runtime:

LimitRecommendationRationale
Hard concurrencyCap concurrency at 10 requests per instanceAt concurrency 10, CPU already reaches ~94% in the worst run; higher concurrency drives instability (refusals, GC stalls)
MemoryUse at least 512 MiB per instance (for concurrency 10)Worst-case observed RSS is ~425 MB; 512 MiB provides a narrow but workable safety margin against burst alignment

Scaling strategy: horizontal, not vertical

  • Vertical scaling (increasing concurrency per instance) quickly runs into CPU and memory limits for this workload.
  • Horizontal scaling is a better fit: treat each instance as a worker with a hard limit of 10 concurrent heavy jobs.

Practically:

  • Configure the service so that no instance exceeds 10 concurrent requests.
  • Let autoscaling handle an increased load by adding instances, not by increasing per-instance concurrency.

Takeaways

  • Tuned runtime settings matter as much as raw resources: a single flag like
    cpu_idle
    can be the difference between predictable behavior and GC-driven chaos.
  • Go needs explicit limits in containers:
    GOMEMLIMIT
    must be set in memory-constrained environments; otherwise, OOM kills are inevitable under heavy ingesting.
  • "Lower memory" is not always better: aggressive GC tuning (
    GOGC
    < 100) did reduce memory usage but directly increased failure rates.
  • Concurrency 10 is the realistic ceiling for a 1 vCPU ECF instance; beyond that, refusals and instability become the norm.
  • Horizontal scaling is the right model: each instance should be treated as a 10-request worker, with higher total throughput coming from more workers rather than more concurrency per worker.

Share this article