Joining two indexes in Elasticsearch

Explaining how to use the terms query and the enrich processor for joining two indexes in Elasticsearch.

In Elasticsearch, joining two indexes is not as straightforward as it is in traditional SQL databases. However, it is possible to achieve similar results using certain techniques and features provided by Elasticsearch. This article will delve into the process of joining two indexes in Elasticsearch, focusing on the use of the terms query and the enrich processor.

Using the terms query for joining two indexes

The terms query is one of the most effective ways to join two indexes in Elasticsearch. This query is used to retrieve documents that contain one or more exact terms in a specific field. Here’s how you can use it to join two indexes:

  1. First, you need to retrieve the required data from the first index. This can be done using a simple GET request.
  2. Once you have the data from the first index, you can use it to query the second index. This is done using the terms query, where you specify the field and the values you want to match.

Here is an example:

GET /second_index/_search
{
  "query": {
    "terms": {
      "field_in_second_index": ["value1_from_first_index", "value2_from_first_index"]
    }
  }
}

In this example, field_in_second_index is the field in the second index that you want to match with the values from the first index. value1_from_first_index and value2_from_first_index are the values from the first index that you want to match in the second index.

The terms query also provides support to perform the two above steps in a single shot using a technique called terms lookup. Elasticsearch will take care of transparently retrieving the values to match from another index. For example, you have a teams index containing a list of players:

PUT teams/_doc/team1
{
  "players":   ["john", "bill", "michael"]
}
PUT teams/_doc/team2
{
  "players":   ["aaron", "joe", "donald"]
}

Now, it is possible to query a people index for all the people playing in team1, as shown below:

GET people/_search?pretty
{
  "query": {
    "terms": {
        "name" : {
            "index" : "teams",
            "id" : "team1",
            "path" : "players"
        }
    }
  }
}

In the example above, Elasticsearch will transparently retrieve the player names from the team1 document present in the teams index (i.e. “john”, “bill”, “michael”) and find all people documents with a name field that contains any of those values. The equivalent SQL query would be:

SELECT p.* FROM people p
INNER JOIN teams t ON p.name = t.players

Using the enrich processor for joining two indexes

The enrich processor is another powerful tool that can be used to join two indexes in Elasticsearch. This processor enriches the data of incoming documents by adding data from a pre-defined enrich index.

Here’s how you can use the enrich processor to join two indexes:

1. First, you need to create an enrich policy. This policy defines which index to use for enrichment and which field to match on. Here is an example:

PUT /_enrich/policy/my_enrich_policy
{
  "match": {
    "indices": "first_index",
    "match_field": "field_in_first_index",
    "enrich_fields": ["field_to_enrich"]
  }
}

2. Once the policy is created, you need to execute it:

POST /_enrich/policy/my_enrich_policy/_execute

3. After executing the policy, you can use the enrich processor in an ingest pipeline to enrich the data of incoming documents:

PUT /_ingest/pipeline/my_pipeline
{
  "processors": [
    {
      "enrich": {
        "policy_name": "my_enrich_policy",
        "field": "field_in_second_index",
        "target_field": "enriched_field"
      }
    }
  ]
}

In this example, field_in_second_index is the field in the second index that you want to enrich with data from the first index. enriched_field is the new field that will contain the enriched data.

One drawback of this approach is that if the data changes in first_index, the enrich policy needs to be re-executed as the enriched index is not updated or synchronized automatically from the source index it has been built from. However, if first_index is relatively stable, then this approach works great.

Conclusion

In conclusion, while Elasticsearch does not support traditional join operations, it provides features like the terms query and the enrich processor that can be used to achieve similar results. It’s important to note that these methods have their limitations and should be used judiciously based on the specific requirements and the nature of the data.

Ready to try this out on your own? Start a free trial.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Related content

Elasticsearch search by two fields

April 30, 2025

Elasticsearch search by two fields

Exploring techniques for searching by two fields, including multi-match queries, bool queries, and query-time field boosting.

Understanding Elasticsearch scoring and the Explain API

May 7, 2025

Understanding Elasticsearch scoring and the Explain API

Diving into the scoring mechanism of Elasticsearch and exploring the Explain API.

Removing a node from an Elasticsearch cluster

May 5, 2025

Removing a node from an Elasticsearch cluster

Explaining how to remove a node from an Elasticsearch cluster.

RAG and the value of grounding

May 1, 2025

RAG and the value of grounding

Learn about RAG, grounding and how to reduce hallucinations by connecting an LLM to your documents.

RAG without “AG”?

April 29, 2025

RAG without “AG”?

Learn how to leverage semantic search and ELSER to build a powerful and visually appealing Q&A experience, without using LLMs.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself