Elasticsearch string contains substring: Advanced query techniques

Explore techniques for querying Elasticsearch to find documents where a field contains a specific substring.

Searching for documents containing specific substrings within a field is a common requirement in Elasticsearch. In this article, we will explore advanced techniques for querying Elasticsearch to find documents where a field contains a specific substring. We will discuss the use of query_string, match_phrase, and wildcard queries, as well as the use of analyzers and tokenizers to improve search accuracy.

1. Query string query

The query_string query is a powerful and flexible way to search for documents containing a specific substring. It allows you to use the Lucene query syntax, which provides a wide range of search options. Here’s an example of a query_string query that searches for documents containing the substring “example”:

GET /_search
{
"query": {
"query_string": {
"query": "*example*"
}
}
}

In this example, the asterisks (*) are used as wildcard characters, which match any sequence of characters. The query_string query will return documents containing the substring “example” in any field. Beware, though, as leading wildcards can be detrimental to your cluster performance.

2. Match phrase query

The match_phrase query is another option for searching for documents containing a specific substring. It searches for the exact phrase within a field, and it can be used with the slop parameter to allow for variations in word order. Here’s an example of a match_phrase query that searches for documents containing the substring “quick brown”:

GET /_search
{
"query": {
"match_phrase": {
"field_name": "quick brown"
}
}
}

In this example, the match_phrase query will return documents containing the exact phrase “quick brown” in the specified field.

3. Wildcard query

The wildcard query is a simple way to search for documents containing a specific substring. It uses wildcard characters to match any sequence of characters within a field. Here’s an example of a wildcard query that searches for documents containing the substring “exam”:

GET /_search
{
"query": {
"wildcard": {
"field_name": "*exam*"
}
}
}

In this example, the wildcard query will return documents containing the substring “exam” in the specified field. In this case, you also need to pay special attention when using leading wildcards in a wildcard query as this can slow down your search performance.

4. Analyzers and tokenizers

To improve the accuracy of substring searches, you can use analyzers and tokenizers to process the text in your documents. Analyzers are responsible for breaking down text into tokens, which are then used for indexing and searching. Tokenizers are a component of analyzers that split text into individual tokens.

For example, you can use the n-gram tokenizer to create tokens of varying lengths from the input text. This can help improve the accuracy of substring searches by allowing Elasticsearch to match substrings of different lengths. Here’s an example of how to create a custom analyzer with an n-gram tokenizer:

PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 5
}
}
}
}
}

In this example, the custom analyzer uses an n-gram tokenizer with a minimum token length of 3 and a maximum token length of 5. You can then use this custom analyzer when indexing your documents and when performing substring searches.

Conclusion

Elasticsearch provides several advanced techniques for querying documents containing specific substrings. By using query_string, match_phrase, and wildcard queries, as well as custom analyzers and tokenizers, you can improve the accuracy and flexibility of your substring searches. Experiment with these techniques to find the best approach for your specific use case and dataset.

Ready to try this out on your own? Start a free trial.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Related content

Elasticsearch in Javascript the proper way, part II

Elasticsearch in Javascript the proper way, part II

Reviewing production best practices and explaining how to run the Elasticsearch Node.js client in Serverless environments.

Elasticsearch in Javascript the proper way, part I

Elasticsearch in Javascript the proper way, part I

Explaining how to create a production-ready Elasticsearch backend in JavaScript.

Displaying fields in an Elasticsearch index

May 26, 2025

Displaying fields in an Elasticsearch index

Exploring techniques for displaying fields in an Elasticsearch index.

Deleting a field from a document in Elasticsearch

May 9, 2025

Deleting a field from a document in Elasticsearch

Exploring methods for deleting a field from a document in Elasticsearch.

Elasticsearch shards and replicas: Getting started guide

May 21, 2025

Elasticsearch shards and replicas: Getting started guide

Master the concepts of shards and replicas in Elasticsearch and learn how to optimize them.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself