In our previous blog post we introduced the redesigned-from-scratch retrievers framework, which enables the creation of complex ranking pipelines. We also explored how the Reciprocal Rank Fusion (RRF) retriever enables hybrid search by merging results from different queries. While RRF is easy to implement, it has a notable limitation: it focuses purely on relative ranks, ignoring actual scores. This makes fine-tuning and optimization a challenge.
Meet the linear retriever!
In this post, we introduce the linear
retriever, our latest addition for supporting hybrid search! Unlike rrf
, the linear
retriever calculates a weighted sum across all queries that matched a document. This approach preserves the relative importance of each document within a result set while allowing precise control over each query’s influence on the final score. As a result, it provides a more intuitive and flexible way to fine-tune hybrid search.
Defining a linear retriever where the final score will be computed as:
It is as simple as:
GET linear_retriever_blog/_search
{
"retriever": {
"linear": {
"retrievers": [
{
"retriever": {
"knn": {
...
}
},
"weight": 5
},
{
"retriever": {
"standard": {
...
}
},
"weight": 1.5
},
]
}
}
}
Notice how simple and intuitive it is? (and really similar to rrf
!) This configuration allows you to precisely control how much each query type contributes to the final ranking, unlike rrf
, which relies solely on relative ranks.
One caveat remains: knn
scores may be strictly bounded, depending on the similarity metric used. For example, with cosine similarity or the dot product of unit-normalized vectors, scores will always lie within the [0, 1]
range. In contrast, bm25
scores are less predictable and have no clearly defined bounds.
Scaling the scores: kNN vs BM25
One challenge of hybrid search is that different retrievers produce scores on different scales. Consider for example the following scenario:
Query A scores:
doc1 | doc2 | doc3 | doc4 | |
---|---|---|---|---|
knn | 0.347 | 0.35 | 0.348 | 0.346 |
bm25 | 100 | 1.5 | 1 | 0.5 |
Query B scores:
doc1 | doc2 | doc3 | doc4 | |
---|---|---|---|---|
knn | 0.347 | 0.35 | 0.348 | 0.346 |
bm25 | 0.63 | 0.01 | 0.3 | 0.4 |
You can see the disparity above: kNN
scores range between 0 and 1, while bm25
scores can vary wildly. This difference makes it tricky to set static optimal weights for combining the results.
Normalization to the rescue: the MinMax normalizer
To address this, we’ve introduced an optional minmax
normalizer that scales scores, independently for each query, to the [0, 1]
range using the following formula:
This preserves the relative importance of each document within a query’s result set, making it easier to combine scores from different retrievers. With normalization, the scores become:
Query A scores:
doc1 | doc2 | doc3 | doc4 | |
---|---|---|---|---|
knn | 0.347 | 0.35 | 0.348 | 0.346 |
bm25 | 1.00 | 0.01 | 0.005 | 0.000 |
Query B scores:
doc1 | doc2 | doc3 | doc4 | |
---|---|---|---|---|
knn | 0.347 | 0.35 | 0.348 | 0.346 |
bm25 | 1.00 | 0.000 | 0.465 | 0.645 |
All scores now lie in the [0, 1]
range and optimizing the weighted sum is much more straightforward as we now capture the (relative to the query) importance of a result instead of its absolute score and maintain consistency across queries.
Example time!
Let’s go through an example now to showcase what the above looks like and how the linear
retriever addresses some of the shortcomings of rrf
. RRF relies solely on relative ranks and doesn’t consider actual score differences. For example, given these scores:
doc1 | doc2 | doc3 | doc4 | |
---|---|---|---|---|
knn | 0.347 | 0.35 | 0.348 | 0.346 |
bm25 | 100 | 1.5 | 1 | 0.5 |
rrf score | 0.03226 | 0.03252 | 0.03200 | 0.03125 |
rrf would rank the documents as:
However, doc1 has a significantly higher bm25
score than the others, which rrf
fails to capture because it only looks at relative ranks. The linear
retriever, combined with normalization, correctly accounts for both the scores and their differences, producing a more meaningful ranking:
doc1 | doc2 | doc3 | doc4 | |
---|---|---|---|---|
knn | 0.347 | 0.35 | 0.348 | 0.346 |
bm25 | 1 | 0.01 | 0.005 | 0 |
As we can see in the above, doc1’s great ranking and score
for bm25
is properly accounted for and reflected on the final scores. In addition to that, all scores lie now in the [0, 1]
range so that we can compare and combine them in a much more intuitive way (and even build offline optimization processes).
Putting it all together
To take full advantage of the linear
retriever with normalization, the search request would look like this:
GET linear_retriever_blog/_search
{
"retriever": {
"linear": {
"retrievers": [
{
"retriever": {
"knn": {
...
}
},
"weight": 5
},
{
"retriever": {
"standard": {
...
}
},
"weight": 1.5,
"normalizer": "minmax"
},
]
}
}
}
This approach combines the best of both worlds: it retains the flexibility and intuitive scoring of the linear
retriever, while ensuring consistent score scaling with MinMax normalization.
As with all our retrievers, the linear
retriever can be integrated into any level of a hierarchical retriever tree, with support for explainability, match highlighting, field collapsing, and more.
When to pick the linear retriever and why it makes a difference
The linear
retriever:
- Preserves relative importance by leveraging actual scores, not just ranks.
- Allows fine-tuning with weighted contributions from different queries.
- Enhances consistency using normalization, making hybrid search more robust and predictable.
Conclusion
The linear
retriever is already available on Elasticsearch Serverless, and the 8.18 and 9.0 releases! More examples and configuration parameters can also be found in our documentation. Try it out and see how it can improve your hybrid search experience — we look forward to your feedback. Happy searching!
Ready to try this out on your own? Start a free trial.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!
Related content

May 26, 2025
Creating Judgement Lists with Quepid
Creating judgement lists in Quepid with a collaborative human rater process.

May 20, 2025
Cracking the code on search quality: The role of judgment lists
Explore why a judgment list is essential, the different types of judgments, and the key factors that define search quality.

April 11, 2025
Enhancing relevance with sparse vectors
Learn how to use sparse vectors in Elasticsearch to boost relevance and personalize search results with minimal complexity.

April 3, 2025
Generating filters and facets using ML
Exploring the pros and cons of automating the creation of filters and facets in a search experience using ML models vs the classical hard-coded approach.

April 16, 2025
ES|QL, you know, for Search - Introducing scoring and semantic search
With Elasticsearch 8.18 and 9.0, ES|QL comes with support for scoring, semantic search and more configuration options for the match function and a new KQL function.