ES|QL now includes full text functions that can be used to filter your data using text queries. We will review the available text filtering methods and understand why these functions provide a better alternative. We will also look at the future improvements for full text functions in ES|QL.
Filtering text with ES|QL
Text data in logs is critical for understanding, monitoring, and troubleshooting systems and applications. The unstructured nature of text allows for flexibility in capturing all sorts of information.
Being unstructured, we need ways of isolating specific patterns, keywords, or phrases. Be it searching for an error message, narrowing down results using tags, or looking for a specific host name, are things that we do all the time to refine our results and eventually obtain the information we're looking for.
ES|QL provides different methods to help you work with text. Elasticsearch 8.17 adds the full text functions match and qstr in tech preview to help tackle more complex search use cases.
Limitations of text filtering
ES|QL already provided text filtering capabilities, including:
- Text equality, to compare full strings directly using the equality operator.
- String start and end, using the
STARTS_WITH
andENDS_WITH
functions. - Pattern and regex matching with the
LIKE
andRLIKE
operators.
Text filtering is useful - but it can fall short on text oriented use cases:
Multivalued fields
Using ES|QL functions with multivalued fields can be tricky - functions return null when applied to a multivalued field.
If you need to apply a function to a multivalued field, you first need to transform the value to a single value using MV_CONCAT so you can match on a single value:
FROM logs
| EVAL all_tags = MV_CONCAT(tags)
| WHERE all_tags == "production"
Analyzed text
Analyzers are incredibly useful for full text search as they allow transforming text. They allow us to extract and modify the indexed text, and modify the queries so we can maximize the possibility of finding what we're looking for.
Text is not analyzed when using text filtering. This means for example that you need to match the text case when searching, or create regexes / patterns that address possible case differences.
This can become more problematic when looking for multilingual text (so you can't use ASCII folding), trying to match on parts of paths (path hierarchy), or removing stopwords.
Performance
Pattern matching and regexes take time. Lucene can do a lot of the heavy lifting by creating finite automata to match using the indexed terms dictionary, but nonetheless it's a computationally intensive process.
As you can see in our 8.17 release blog, using regular expressions can be up to 50-1000x slower than using full text functions for text filtering, depending on your data set.
Enter full text functions
Elasticsearch 8.17 and Serverless introduced two new functions as tech preview for text matching: MATCH and query string (abbreviated QSTR).
These functions address some of the limitations that existed for text filtering:
- They can be used directly on multivalued fields. They will return results when any of the values in a multivalued field matches the query.
- They use analyzers for text fields. The query will be analyzed using any existing analyzers for the target fields, which will allow matching regardless of case. This also unlocks ASCII folding, removing stopwords, and even using synonyms.
- They are performant. Instead of relying on pattern matching or regular expressions, they can directly use Lucene index structures to locate specific terms in your data.
MATCH function
MATCH
allows matching a value on a specific field:
FROM logs
| WHERE match(message, "connection lost")
Match function uses a match query under the hood. This means that it will create a boolean query when multiple terms are used, with OR as the default operator for combining them.
Match function currently has some limitations:
- It does not provide a way to specify parameters. It will use the defaults for the match query.
- It can only be used in WHERE clauses.
- It can't be used after a STATS or LIMIT command
The following limitations exist in 8.17 version:
- Only text or keyword fields can be used with
MATCH
. MATCH
can be combined with other conditions as part of an AND expression, but not as part of an OR expression.WHERE match(message, "connection lost") AND length(message) > 10
can be used, but notWHERE match(message, "connection lost") OR length(message) > 10
.
We're actively working to lift these restrictions so you can use the full power of MATCH
. You can see the progress in Elastic Cloud Serverless, which is continuously up to date with our new work. This is what the current status of serverless is for the above limitations:
MATCH
can be used with almost any field type, and convert string values automatically to any type.MATCH
and full text functions allows using OR conditions when all elements of the conditions are full text functions
Check the latest documentation to see what the status for MATCH
is on Serverless.
Match operator
The match operator (:) is equivalent to the match function above, but it offers a more succinct syntax:
FROM logs
| WHERE message:"connection lost"
It is more convenient to use the match operator, but you can use whichever makes more sense to you.
Match operator has the same limitations as the match function.
Query string function
Query string function (QSTR
) uses the query string syntax to perform complex queries on one or several fields:
FROM logs
| WHERE qstr("message: 'connection lost' AND tags:'production'")
Query string syntax allows to specify powerful full text options and operations, including fuzzy search, proximity searches and the use of boolean operators. Refer to the docs for more details.
Query string is a very powerful tool, but currently has some limitations, very similar to the MATCH
function:
- It does not provide a way to specify parameters like the match type or specifying the default fields to search for.
- It can only be used in WHERE clauses.
- It can't be used after STATS or LIMIT commands
- It can't be used after commands that modify columns, like SHOW, ROW, DISSECT, DROP, ENRICH, EVAL, GROK, KEEP, MV_EXPAND, or RENAME
Similar to the MATCH
function, we have a limitation for the OR conditions. QSTR
can be combined with other conditions as part of an AND expression, but not as part of an OR expression. WHERE qstr("message: 'connection lost'") AND length(message) > 10
can be used, but not WHERE qstr("message: 'connection lost'") OR length(message) > 10
.
Elastic Cloud Serverless allows using OR conditions when all elements of the conditions are full text functions. Check the latest documentation to see what the status for QSTR
is on Serverless.
What's next
What's coming for full text search? Quite a few things:
- Adding tuning options for the behaviour of
MATCH
andQSTR
functions - An additional
KQL
function that can be used to port your existing Kibana queries to ES|QL - Removing the current limitations for full text functions
We're also working to add scoring, so you can start using ES|QL for relevance matching and not just for filtering. This is quite exciting as this will define how the future of text search will be like in Elasticsearch!
Give it a try
MATCH
and QSTR
are available as tech preview on Elasticsearch 8.17, and of course they are always up to date in Serverless.
What are you looking for in terms of text filtering? Let us know your feedback!
Happy full text filtering!
Ready to try this out on your own? Start a free trial.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!
Related content
June 10, 2024
Storage wins for time-series data in Elasticsearch
Explore Elasticsearch's storage improvements for time series data and best practices for configuring a TSDS with storage efficiency.