Vinay Chandrasekhar

Explore and Analyze Metrics with Ease in Elastic Observability

The latest enhancements to ES|QL and Discover based metrics exploration unleash a potent set of tools for quick and effective metrics analytics.

Explore and Analyze Metrics with Ease in Elastic Observability

Metrics are critical in identifying the “what”

As a core pillar of Observability, metrics offer a highly structured, quantitative view of system performance and health. They provide a crucial symptomatic perspective—revealing what is happening, such as high application latency, increasing service errors, or spiking container CPU utilization, which is essential for initiating alerting and triaging efforts. This capability for effective monitoring, alerting, and triaging is paramount to ensuring robust service delivery and achieving successful business outcomes.

Elastic Observability provides a comprehensive, end-to-end experience for metrics data. Elastic ensures that metrics data can be collected from numerous sources, enriched as needed and shipped to the Elastic Stack. Elastic efficiently stores this time series data, including high-cardinality metrics, utilizing the TSDS index mode (Time Series Data Stream), introduced in prior versions and used across Elastic time series integrations. This foundation ensures comprehensive observability through out-of-the-box dashboards, alerts, SLOs, and streamlined data management.

Elastic Observability 9.2 provides enhancements to metrics exploration and analysis through powerful query language extensions and expanded UI capabilities. These enhancements focus on making analysis on TSDS data via counter rates and common aggregations over time easier and faster than ever before.

The main metrics enhancements center on these key features, offered as Tech Preview:

  1. Metrics analytics with TSDS and ES|QL
  2. Interactive metrics exploration in Discover
  3. OTLP endpoint for metrics

Metrics analytics with TSDS and ES|QL

The introduction of the new

source command in ES|QL (Elasticsearch Query Language) on TSDS metrics dramatically simplifies time series analysis.

The

TS
command is specifically designed to target only time series indices, differentiating it from the general
FROM
command. Its core power lies in enabling a dedicated suite of time series aggregation functions within the
STATS
command.

This mechanism utilizes a dual aggregation paradigm, which is standard for time series querying. These queries involve two aggregation functions:

  • Inner (Time Series) function: Applied implicitly per time series, often over bucketed time intervals.

  • Outer (Regular) function: Used to aggregate the results of the inner function across groups. For instance, if you use

    STATS SUM(RATE(search_requests)) BY TBUCKET(1 hour), host
    , the
    RATE()
    function is the inner function applied per time series in hourly buckets, and
    SUM()
    is the outer function, summing these rates for each host and hourly bucket.

If an ES|QL query using the

TS
command is missing an inner (time series) aggregation function,
LAST_OVER_TIME()
is implicitly assumed and used. For example,
TS metrics | STATS AVG(memory_usage)
is equivalent to
TS metrics | STATS AVG(LAST_OVER_TIME(memory_usage))
.

Key time series aggregation functions available in ES|QL via
TS
command

These functions allow for powerful analysis on time-series data:

FunctionDescriptionExample Use Case
RATE()
/
IRATE()
Calculates the per-second average rate of increase of a counter (
RATE
), accounting for non-monotonic breaks like counter resets, making it the most appropriate function for counters, or the per-second rate of increase between the last two data points (
IRATE
), ignoring all but the last two points for high responsiveness.
Calculating request per second (RPS) or throughput.
AVG_OVER_TIME()
Calculates the average of a numeric field over the defined time range.Determining average resource usage over an hour.
SUM_OVER_TIME()
Calculates the sum of a field over the time range.Total errors over a specific time window.
MAX_OVER_TIME()
/
MIN_OVER_TIME()
Calculates the maximum or minimum value of a field over time.Identifying peak resource consumption.
DELTA()
/
IDELTA()
Calculates the absolute change of a gauge field over a time window (
DELTA
) or specifically between the last two data points (
IDELTA
), making
IDELTA
more responsive to recent changes.
Tracking changes in system gauge metrics (e.g., buffer size).
INCREASE()
Calculates the absolute increase of a counter (
INCREASE
).
Analyzing immediate rate changes in fast-moving counters.
FIRST_OVER_TIME()
/
LAST_OVER_TIME()
Calculates the earliest or latest recorded value of a field, determined by the
@timestamp
field.
Inspecting initial and final metric states within a bucket.
ABSENT_OVER_TIME()
/
PRESENT_OVER_TIME()
Calculates the absence or presence of a field in the result over the time range.Identifying monitoring coverage gaps.
COUNT_OVER_TIME()
/
COUNT_DISTINCT_OVER_TIME()
Calculates the total count or the count of distinct values of a field over time.Measuring frequency or cardinality changes.

These functions, available with the

TS
command, allow SREs and Ops teams to easily perform rate calculations and other common aggregations, enabling efficient metrics analysis as a routine part of observability workflows. And it’s much faster, too! Internal performance testing has revealed that TS commands outperform other ways of querying metrics data by an order of magnitude or more, and consistently! 

Interactive metrics exploration in Discover

The 9.2 release introduces the capability to explore and analyze metrics directly and interactively within the Discover interface. In addition to exploring and analyzing logs and raw events, Discover now provides a dedicated environment for metrics exploration:

  • Easy start: Begin exploration simply by querying metrics ingested via

    TS metrics-*
    .

  • Grid view and pre-applied aggregations: This command displays all metrics in a grid format at a glance, immediately applying the appropriate aggregations based on the metric type, such as

    rate
    versus
    avg
    .

  • Search and group-by: Quickly search for specific metrics by name. Also easily group and analyze metrics by dimensions (labels) and specific values. This allows narrowing down to metrics and dimensions of choice for targeted analysis.

  • Quick access to details: Furthermore, the interface provides access to crucial details, including query and response details, the underlying ES|QL commands, the metric field type, and applicable dimensions, for each metric.

  • Easy tweaking and dashboarding: The system automatically populates ES|QL queries, aiding in making easy tweaks, slicing, and dicing the data. Once analyzed, metrics and resulting analyses can be added to new or existing dashboards with ease.

OTLP endpoint for metrics

We are also introducing a native OpenTelemetry Protocol (OTLP) endpoint specifically for metrics ingest directly into Elasticsearch. The endpoint especially benefits self-managed customers, and will be integrated into our Elastic Cloud Managed OTLP Endpoint for Elastic-managed offerings. The native endpoint and related updates improve ingest performance and scalability of OTel metrics, providing up to 60% higher throughput via

_otlp
, and up to 25% higher throughput when using classic
_bulk
methods. 

In Conclusion

By merging the power of ES|QL's new time series aggregations with the familiar interactive experience of Discover, Elastic 9.2 enables a potent set of metrics analytics tools. The tools significantly boost the exploration and analysis phase of any observability workflow. And we’re just getting started on unleashing the full power of metrics in Elastic Observability!

We welcome you to try the new features today!

Also learn more about how we provide metrics analytics for AWS, Azure, GCP, Kubernetes, and LLMs on Observability Labs

Share this article