In this blog post we'll take another deep dive with retrievers. We've already talked about them in previous blogs from their very introduction to semantic reranking using retrievers. Now, we're happy to announce that retrievers are becoming generally available with Elasticsearch 8.16.0, and in this blog post we'll take a technical tour on how we implemented them, as well as we'll get the chance to discuss the newly available capabilities!
Retrievers
The main concept of a retriever
remains the same as with their initial release; retrievers is a framework that provides the basic building blocks that can be stacked hierarchically to build multi-stage complex retrieval and ranking pipelines. E.g. of a simple standard
retriever, which just bring backs all documents:
GET retrievers_example/_search
{
"retriever": {
"standard": {
"query": {
"match_all": {}
}
}
}
}
Pretty straightforward, right? In addition to the standard
retriever, which is essentially just a wrapper around the standard query
search API element, we also support the following types:
knn
- return the top documents from a kNN (k Nearest Neighbor) searchrrf
- combine results from different retrievers based on the RRF (Reciprocal Rank Fusion) ranking formulatext_similarity_reranker
- rerank the top results of a nested retriever using arerank
type inference endpoint
More detailed information along with the specific parameters for each retriever can also be found in the Elasticsearch documentation.
Let's briefly go through some of the technical details first, which will help us understand the architecture and what has changed and why all these previous limitations have now been lifted!
Technical drill down
One of the most important (and requested) things that we wanted to address was the ability to use any retriever, at any nesting level. Whether this means having 2 or more text_similarity_reranker
stacked together, or an rrf
retriever operating on top of another rrf
along with a text_similarity_reranker
, or any combination and nesting you can think of, we wanted to make sure that this would be something one could express with retrievers!
To account for this, we have introduced some significant changes to the retriever execution plan. Up until now, retrievers were evaluated as part of the standard search execution flow, where (in a simplified scenario for illustration purposes) we reach out to the shards twice:
- once for querying the shards and bringing back
from + size
documents from each shard, and - once for fetching all field data and perform any additional operations (e.g. highlighting) for the true top
[from, from+size]
results.
This is a nice linear execution flow that is (relatively) easy to follow, but introduces some significant limitations if we want to execute multiple queries, operate on different results sets, etc. In order to work around this, we have moved to an eager evaluation of all sub-retrievers of a retriever pipeline at the very early stages of query execution. This means that, if needed, we are recursively rewriting any retriever query to a simpler form, the specifics of which depend on the retriever type.
- For non-compound retrievers we rewrite similar to how we do in a standard query, as they could still follow the linear execution plan.
- For compound retrievers, i.e. for retrievers that operate on top of other retriever(s), we flatten them to a single
rank_window_size
result set, which is essentially a<doc, shard>
tuple list that represents the top ranked documents for this retriever.
Let's see what this actually looks like, by working through the following (rather complex) retriever request:
{
"retriever": {
"rrf": { [1]
"retrievers": [
{
"knn": { [2]
"field": "emb1",
"query_vector_builder": {
"text_embedding": {
"model_id": "my-text-embedding-model",
"model_text": "LLM applications in information retrieval"
}
}
}
},
{
"standard": { [3]
"query": {
"term": {
"topic": "science"
}
}
}
},
{
"rrf": { [4]
"retrievers": [
{
"standard": { [5]
"query": {
"range": {
"year": {
"gte": 2020
}
}
}
}
},
{
"knn": { [6]
"field": "emb2",
"query_vector_builder": {
"text_embedding": {
"model_id": "my-text-embedding-model",
"model_text": "Vector scale on production systems"
}
}
}
}
],
"rank_window_size": 100,
"rank_constant": 10
}
}
],
"rank_window_size": 10,
"rank_constant": 1
}
}
}
The rrf
retriever above is a compound one, as it operates on the results of some other retrievers, so we'll try to rewrite it to a simpler, flattened, list of <doc, shard>
tuples, where each tuple specifies a document and the shard that it was found on. This rewrite will also enforce a strict ranking, so no different sort options are currently supported.
Let's proceed now to identify all components and describe the process of how this will be evaluated:
[1] top level rrf
retriever; this is the parent of all sub-retrievers which will be rewritten and evaluated last, as we'd first need to know the top 10 (based on rank_window_size
) results from each of its sub-retrievers.
[2] This knn
retriever is the first child of the top level rrf
retriever and uses an embedding service (my-text-embedding-model
) to compute the actual query vector that will be used. This will be rewritten as the usual knn
query by making an async request to the embedding service to compute the vector for the given model_text
.
[3] A standard
retriever that is also part of the top-level's rrf
retriever's children, which returns all documents matching topic: science
query.
[4] Last child of the top-level rrf
retriever which is also an rrf
retrievers that needs to be flattened.
[5] [6] similar to [2] and [3], these are retrievers that are direct children of an rrf
retriever, for which we will fetch the top 100 results (based on the rrf
retriever's rank_window_size
[4]) for each one, combine them using the rrf
formula, and then rewrite to a flattened <doc, shard>
list of the true top 100 results.
The updated execution flow for retrievers is now as follows:
- We'll start by rewriting all leaves that we can. This means that we'll rewrite the
knn
retrievers [2] and [6] to compute the query vector, and once we have that we can move up one level in the tree. - At the next rewrite step, we are now ready to evaluate the nested
rrf
retriever [4], which we will eventually rewrite to a flattenedRankDocsQuery
query (i.e. a list of<doc, shard>
tuples). - Finally, all inner rewritten steps for the top-level
rrf
retriever [1] will have taken place, so we should be ready to combine and rank the true top 10 results as requested. Even this top-levelrrf
retriever will rewrite itself to a flattenedRankDocsQuery
which will be later used to proceed with the standard linear search execution flow.
Visualizing all the above, we have:
Looking at the example above, we can see how a hierarchical retriever tree is asynchronously rewritten to just a simple RankDocsQuery
. This simplification gives us the nice (and desired!) side effect of eventually executing a normal request with explicit ranking, and in addition to that we can also perform any complementary operations we choose.
Playing with the (golden) retrievers!
As we briefly mentioned above, with the rework in place, we can now support a plethora of additional search features! In this section we'll go through some examples and use-cases, but more can also be found in the documentation.
We'll start with the most coveted one which is composability, i.e. the option to have any retriever at any level of the retriever tree.
Composability
In the following example, we want to perform a semantic
query (using an embedding service like ELSER
), and then merge those results along with a knn
query, using rrf
. Finally, we'd want to rerank those using the text_similarity_reranker
retriever using a reranker. The retriever to express the above would look like this:
GET /retrievers_example/_search
{
"retriever": {
"text_similarity_retriever": {
"retriever": {
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"semantic": {
"field": "inference_field",
"query": "Can I use generative AI to identify user intent and improve search relevance?"
}
}
}
},
{
"knn": {
"field": "vector",
"query_vector": [
0.23,
0.67,
0.89
],
"k": 3,
"num_candidates": 5
}
}
],
"rank_window_size": 10,
"rank_constant": 1
}
},
"field": "text",
"inference_text": "LLM applications on production search applications",
"inference_id": "my-reranker-model",
"rank_window_size": 10
}
},
"_source": [
"text",
"topic"
]
}
Aggregations
Recall that with the rework we discussed, we rewrite a compound retriever to just a RankDocsQuery
(i.e. a flattened explicitly ranked result list). This however does not block us from computing aggregations, as we also keep track of the source queries that were part of a compound retriever. This means that we can fallback to the nested standard
retrievers below, to properly compute aggregations for the topic
field, based on the union of the results of the two nested retrievers.
GET retrievers_example/_search
{
"retriever": {
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"range": {
"year": {
"gt": 2023
}
}
}
}
},
{
"standard": {
"query": {
"term": {
"topic": "elastic"
}
}
}
}
],
"rank_window_size": 10,
"rank_constant": 1
}
},
"_source": [
"text",
"topic"
],
"aggs": {
"topics": {
"terms": {
"field": "topic"
}
}
}
}
So in the example above, we'll compute a term
aggregation for the topic field, where either the year
field is greater than 2023, or the document has the topic
elastic associated with it.
Collapsing
In addition to the aggregation option we discussed above, we can now also collapse results, as we'd do with a standard query
request. In the following example, we compute the top 10 results of the rrf
retriever, and then collapse them under the year
field. The main difference with standard searches is that here we're collapsing just the top rank_window_size
results, and not the ones within the nested retrievers.
GET /retrievers_example/_search
{
"retriever": {
"rrf": {
"retrievers": [
{
"text_similarity_reranker": {
"retriever": {
"standard": {
"query": {
"term": {
"topic": "ai"
}
}
}
},
"field": "text",
"inference_text": "Can I use generative AI to identify user intent and improve search relevance?",
"rank_window_size": 10,
"inference_id": "my-reranker-model"
}
},
{
"knn": {
"field": "vector",
"query_vector":
[
0.23,
0.67,
0.89
],
"k": 3,
"num_candidates": 5
}
}
],
"rank_window_size": 10,
"rank_constant": 1
}
},
"collapse": {
"field": "year",
"inner_hits": {
"name": "year_results",
"_source": [
"text",
"year"
]
}
},
"_source": [
"text",
"topic"
]
}
Pagination
As is also specified in the docs compound retrievers also support pagination. There is a significant difference with standard queries where, similarly to collapse
above, the rank_window_size
parameter is the whole result set upon which we can perform navigation. This means that if from + size > rank_window_size
then we would bring no results back (but we'd still return aggregations).
GET /retrievers_example/_search
{
"retriever": {
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"term": {
"topic": "elastic"
}
}
}
},
{
"knn": {
"field": "vector",
"query_vector":
[
0.23,
0.67,
0.89
],
"k": 3,
"num_candidates": 5
}
}
],
"rank_window_size": 10,
"rank_constant": 1
}
},
"from": 2,
"size": 2
"_source": [
"text",
"topic"
]
}
In the example above, we would compute the top 10 results (as defined in rrf's rank_window_size
) from the combination of the two nested retrievers (standard
and knn
) and then we'd perform pagination by consulting the from
and size
parameters. So, in this case, we'd skip the first 2 results (from
) and pick the next 2 (size
).
Consider now a different scenario, where, in the same query above, we would instead have from: 10
and size: 2
. Given that rank_window_size
is 10, and that these would be all the results that we can paginate upon, requesting to get 2 results after skipping the first 10 would fall outside of the navigatable result set, so we'd get back empty results. Additional examples and a more detailed break-down can also be found in the documentation for the rrf retriever.
Explain
We know that with great power comes great responsibility. Given that we can now combine retrievers in arbitrary ways, it could be rather difficult to understand why a result was eventually returned first, and how to optimize our retrieval strategy. For this very specific reason, we have worked to ensure that the explain
output of a retriever request (i.e. by specifying explain: true
) will convey all necessary information from all sub-retrievers, so that we can have a proper understanding of all the factors that contributed to the final ranking of a result. Taking the rather complex query in the Collapsing
section, the explain for the first result looks like this:
{
"_explanation":{
"value": 0.8333334,
"description": "sum of:",
"details": [
{
"value": 0.8333334,
"description": "rrf score: [0.8333334] computed for initial ranks [2, 1] with rankConstant: [1] as sum of [1 / (rank + rankConstant)] for each query",
"details": [
{
"value": 2,
"description": "rrf score: [0.33333334], for rank [2] in query at index [0] computed as [1 / (2 + 1)], for matching query with score",
"details": [
{
"value": 0.0011925492,
"description": "text_similarity_reranker match using inference endpoint: [my-awesome-rerank-model] on document field: [text] matching on source query ",
"details": [
{
"value": 0.3844723,
"description": "weight(topic:ai in 1) [PerFieldSimilarity], result of:",
"details":
[
...
]
}
]
}
]
},
{
"value": 1,
"description": "rrf score: [0.5], for rank [1] in query at index [1] computed as [1 / (1 + 1)], for matching query with score",
"details":
[
{
"value": 1,
"description": "doc [1] with an original score of [1.0] is at rank [1] from the following source queries.",
"details":
[
{
"value": 1,
"description": "found vector with calculated similarity: 1.0",
"details":
[]
}
]
}
]
}
]
}
]
}
}
Still a bit verbose, but it conveys all necessary information on why a document is at a specific position. For the top-level rrf
retriever, we have 2 details
specified, one for each of its nested retrievers. The first one is a text_similarity_reranker
retriever, where we can see on explain the weight for the rerank operation, and the second one is a knn
query informing us of the doc's computed similarity with the query vector. It might take a bit to familiarize with, but each retriever ensures to output all the information you might need to evaluate and optimize your search scenario!
Conclusion
That's all for now! We hope you stayed with us until now and you enjoyed this topic! We're really excited with the release of the retriever
framework and all the new use-cases that we can now support! Retrievers were built in order to support from very simple searches, to advanced RAG and hybrid search scenarios! As mentioned above, watch this space and more will be available soon!
Ready to try this out on your own? Start a free trial.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!
Related content
December 19, 2024
Ensuring business rules work seamlessly with semantic search
Harness the power of query rules combined with semantic search and rerankers.
December 23, 2024
Improve search results by calibrating model scoring in Elasticsearch
Learn how to leverage annotated data to calibrate semantic model scoring for better search results
November 1, 2024
Interval queries: why they are true positional queries, and how to transition from Span
Explains how Interval queries are true positional queries and how to transition to them from Span queries.
November 4, 2024
Reranking with an Elasticsearch-hosted cross-encoder from HuggingFace
Learn how to use a model from Hugging Face to host and perform semantic-reranking in Elasticsearch.