Filtering in ES|QL using full text search

8.17 included match and qstr functions in ES|QL, that can be used to perform full text filtering. This article describes what they do, how they can be used, the difference with the existing text filtering methods, current limitations and future improvements.

ES|QL now includes full text functions that can be used to filter your data using text queries. We will review the available text filtering methods and understand why these functions provide a better alternative. We will also look at the future improvements for full text functions in ES|QL.

Filtering text with ES|QL

Text data in logs is critical for understanding, monitoring, and troubleshooting systems and applications. The unstructured nature of text allows for flexibility in capturing all sorts of information.

Being unstructured, we need ways of isolating specific patterns, keywords, or phrases. Be it searching for an error message, narrowing down results using tags, or looking for a specific host name, are things that we do all the time to refine our results and eventually obtain the information we're looking for.

ES|QL provides different methods to help you work with text. Elasticsearch 8.17 adds the full text functions match and qstr in tech preview to help tackle more complex search use cases.

Limitations of text filtering

ES|QL already provided text filtering capabilities, including:

Text filtering is useful - but it can fall short on text oriented use cases:

Multivalued fields

Using ES|QL functions with multivalued fields can be tricky - functions return null when applied to a multivalued field.

If you need to apply a function to a multivalued field, you first need to transform the value to a single value using MV_CONCAT so you can match on a single value:

FROM logs
| EVAL all_tags = MV_CONCAT(tags)
| WHERE all_tags == "production"

Analyzed text

Analyzers are incredibly useful for full text search as they allow transforming text. They allow us to extract and modify the indexed text, and modify the queries so we can maximize the possibility of finding what we're looking for.

Text is not analyzed when using text filtering. This means for example that you need to match the text case when searching, or create regexes / patterns that address possible case differences.

This can become more problematic when looking for multilingual text (so you can't use ASCII folding), trying to match on parts of paths (path hierarchy), or removing stopwords.

Performance

Pattern matching and regexes take time. Lucene can do a lot of the heavy lifting by creating finite automata to match using the indexed terms dictionary, but nonetheless it's a computationally intensive process.

As you can see in our 8.17 release blog, using regular expressions can be up to 50-1000x slower than using full text functions for text filtering, depending on your data set.

Enter full text functions

Elasticsearch 8.17 and Serverless introduced two new functions as tech preview for text matching: MATCH and query string (abbreviated QSTR).

These functions address some of the limitations that existed for text filtering:

  • They can be used directly on multivalued fields. They will return results when any of the values in a multivalued field matches the query.
  • They use analyzers for text fields. The query will be analyzed using any existing analyzers for the target fields, which will allow matching regardless of case. This also unlocks ASCII folding, removing stopwords, and even using synonyms.
  • They are performant. Instead of relying on pattern matching or regular expressions, they can directly use Lucene index structures to locate specific terms in your data.

MATCH function

MATCH allows matching a value on a specific field:

FROM logs
| WHERE match(message, "connection lost")

Match function uses a match query under the hood. This means that it will create a boolean query when multiple terms are used, with OR as the default operator for combining them.

Match function currently has some limitations:

  • It does not provide a way to specify parameters. It will use the defaults for the match query.
  • It can only be used in WHERE clauses.
  • It can't be used after a STATS or LIMIT command

The following limitations exist in 8.17 version:

  • Only text or keyword fields can be used with MATCH.
  • MATCH can be combined with other conditions as part of an AND expression, but not as part of an OR expression. WHERE match(message, "connection lost") AND length(message) > 10 can be used, but not WHERE match(message, "connection lost") OR length(message) > 10.

We're actively working to lift these restrictions so you can use the full power of MATCH. You can see the progress in Elastic Cloud Serverless, which is continuously up to date with our new work. This is what the current status of serverless is for the above limitations:

  • MATCH can be used with almost any field type, and convert string values automatically to any type.
  • MATCH and full text functions allows using OR conditions when all elements of the conditions are full text functions

Check the latest documentation to see what the status for MATCH is on Serverless.

Match operator

The match operator (:) is equivalent to the match function above, but it offers a more succinct syntax:

FROM logs
| WHERE message:"connection lost"

It is more convenient to use the match operator, but you can use whichever makes more sense to you.

Match operator has the same limitations as the match function.

Query string function

Query string function (QSTR) uses the query string syntax to perform complex queries on one or several fields:

FROM logs
| WHERE qstr("message: 'connection lost' AND tags:'production'")

Query string syntax allows to specify powerful full text options and operations, including fuzzy search, proximity searches and the use of boolean operators. Refer to the docs for more details.

Query string is a very powerful tool, but currently has some limitations, very similar to the MATCH function:

  • It does not provide a way to specify parameters like the match type or specifying the default fields to search for.
  • It can only be used in WHERE clauses.
  • It can't be used after STATS or LIMIT commands
  • It can't be used after commands that modify columns, like SHOW, ROW, DISSECT, DROP, ENRICH, EVAL, GROK, KEEP, MV_EXPAND, or RENAME

Similar to the MATCH function, we have a limitation for the OR conditions. QSTR can be combined with other conditions as part of an AND expression, but not as part of an OR expression. WHERE qstr("message: 'connection lost'") AND length(message) > 10 can be used, but not WHERE qstr("message: 'connection lost'") OR length(message) > 10.

Elastic Cloud Serverless allows using OR conditions when all elements of the conditions are full text functions. Check the latest documentation to see what the status for QSTR is on Serverless.

What's next

What's coming for full text search? Quite a few things:

  • Adding tuning options for the behaviour of MATCH and QSTR functions
  • An additional KQL function that can be used to port your existing Kibana queries to ES|QL
  • Removing the current limitations for full text functions

We're also working to add scoring, so you can start using ES|QL for relevance matching and not just for filtering. This is quite exciting as this will define how the future of text search will be like in Elasticsearch!

Give it a try

MATCH and QSTR are available as tech preview on Elasticsearch 8.17, and of course they are always up to date in Serverless.

What are you looking for in terms of text filtering? Let us know your feedback!

Happy full text filtering!

Ready to try this out on your own? Start a free trial.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Related content

Storage wins for time-series data in Elasticsearch

Storage wins for time-series data in Elasticsearch

Explore Elasticsearch's storage improvements for time series data and best practices for configuring a TSDS with storage efficiency.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself