Metrics are critical in identifying the “what”
As a core pillar of Observability, metrics offer a highly structured, quantitative view of system performance and health. They provide a crucial symptomatic perspective—revealing what is happening, such as high application latency, increasing service errors, or spiking container CPU utilization, which is essential for initiating alerting and triaging efforts. This capability for effective monitoring, alerting, and triaging is paramount to ensuring robust service delivery and achieving successful business outcomes.
Elastic Observability provides a comprehensive, end-to-end experience for metrics data. Elastic ensures that metrics data can be collected from numerous sources, enriched as needed and shipped to the Elastic Stack. Elastic efficiently stores this time series data, including high-cardinality metrics, utilizing the TSDS index mode (Time Series Data Stream), introduced in prior versions and used across Elastic time series integrations. This foundation ensures comprehensive observability through out-of-the-box dashboards, alerts, SLOs, and streamlined data management.
Elastic Observability 9.2 provides enhancements to metrics exploration and analysis through powerful query language extensions and expanded UI capabilities. These enhancements focus on making analysis on TSDS data via counter rates and common aggregations over time easier and faster than ever before.
The main metrics enhancements center on these key features, offered as Tech Preview:
- Metrics analytics with TSDS and ES|QL
- Interactive metrics exploration in Discover
- OTLP endpoint for metrics
Metrics analytics with TSDS and ES|QL
The introduction of the new
The
This mechanism utilizes a dual aggregation paradigm, which is standard for time series querying. These queries involve two aggregation functions:
-
Inner (Time Series) function: Applied implicitly per time series, often over bucketed time intervals.
-
Outer (Regular) function: Used to aggregate the results of the inner function across groups. For instance, if you use
STATS SUM(RATE(search_requests)) BY TBUCKET(1 hour), host, theRATE()function is the inner function applied per time series in hourly buckets, andSUM()is the outer function, summing these rates for each host and hourly bucket.
If an ES|QL query using the
Key time series aggregation functions available in ES|QL via TS command
These functions allow for powerful analysis on time-series data:
| Function | Description | Example Use Case |
RATE() / IRATE() | Calculates the per-second average rate of increase of a counter ( RATE ), accounting for non-monotonic breaks like counter resets, making it the most appropriate function for counters, or the per-second rate of increase between the last two data points (IRATE ), ignoring all but the last two points for high responsiveness. | Calculating request per second (RPS) or throughput. |
AVG_OVER_TIME() | Calculates the average of a numeric field over the defined time range. | Determining average resource usage over an hour. |
SUM_OVER_TIME() | Calculates the sum of a field over the time range. | Total errors over a specific time window. |
MAX_OVER_TIME() / MIN_OVER_TIME() | Calculates the maximum or minimum value of a field over time. | Identifying peak resource consumption. |
DELTA() / IDELTA() | Calculates the absolute change of a gauge field over a time window ( DELTA ) or specifically between the last two data points (IDELTA ), making IDELTA more responsive to recent changes. | Tracking changes in system gauge metrics (e.g., buffer size). |
INCREASE() | Calculates the absolute increase of a counter ( INCREASE ). | Analyzing immediate rate changes in fast-moving counters. |
FIRST_OVER_TIME() / LAST_OVER_TIME() | Calculates the earliest or latest recorded value of a field, determined by the @timestamp field. | Inspecting initial and final metric states within a bucket. |
ABSENT_OVER_TIME() / PRESENT_OVER_TIME() | Calculates the absence or presence of a field in the result over the time range. | Identifying monitoring coverage gaps. |
COUNT_OVER_TIME() / COUNT_DISTINCT_OVER_TIME() | Calculates the total count or the count of distinct values of a field over time. | Measuring frequency or cardinality changes. |
These functions, available with the
Interactive metrics exploration in Discover
The 9.2 release introduces the capability to explore and analyze metrics directly and interactively within the Discover interface. In addition to exploring and analyzing logs and raw events, Discover now provides a dedicated environment for metrics exploration:
-
Easy start: Begin exploration simply by querying metrics ingested via
TS metrics-*. -
Grid view and pre-applied aggregations: This command displays all metrics in a grid format at a glance, immediately applying the appropriate aggregations based on the metric type, such as
rateversusavg. -
Search and group-by: Quickly search for specific metrics by name. Also easily group and analyze metrics by dimensions (labels) and specific values. This allows narrowing down to metrics and dimensions of choice for targeted analysis.
-
Quick access to details: Furthermore, the interface provides access to crucial details, including query and response details, the underlying ES|QL commands, the metric field type, and applicable dimensions, for each metric.
-
Easy tweaking and dashboarding: The system automatically populates ES|QL queries, aiding in making easy tweaks, slicing, and dicing the data. Once analyzed, metrics and resulting analyses can be added to new or existing dashboards with ease.
OTLP endpoint for metrics
We are also introducing a native OpenTelemetry Protocol (OTLP) endpoint specifically for metrics ingest directly into Elasticsearch. The endpoint especially benefits self-managed customers, and will be integrated into our Elastic Cloud Managed OTLP Endpoint for Elastic-managed offerings. The native endpoint and related updates improve ingest performance and scalability of OTel metrics, providing up to 60% higher throughput via
In Conclusion
By merging the power of ES|QL's new time series aggregations with the familiar interactive experience of Discover, Elastic 9.2 enables a potent set of metrics analytics tools. The tools significantly boost the exploration and analysis phase of any observability workflow. And we’re just getting started on unleashing the full power of metrics in Elastic Observability!
We welcome you to try the new features today!
Also learn more about how we provide metrics analytics for AWS, Azure, GCP, Kubernetes, and LLMs on Observability Labs
