Take your search experiences up to level 11 with our new state-of-the-art cross-encoder Elastic Rerank model (in Tech Preview). Reranking models provide a semantic boost to any search experience, without requiring you to change the schema of your data, giving you room to explore other relevance tools for semantic relevance on your own time and within your budget.
Semantic boost your keyword search: Regardless of where or how your data is stored, indexed or searched today, semantic reranking is an easy additional step that allows you to boost your existing search results with semantic understanding. You have the flexibility to apply this as needed– without requiring changes to your existing data or indexing pipelines and you can do this with an Elastic foundational model as your easy first choice.
Flexibility of choice for any budget: All search experiences can be improved with the addition of semantic meaning which is typically applied by utilizing a dense or sparse vector model such as ELSER. However, achieving your relevance goals doesn’t require a one-size-fits-all solution, it’s about mixing and matching tools to balance performance and cost. Hybrid search is one such option, improving relevance by combining semantic search with keyword search using reciprocal rank fusion (RRF) in Elasticsearch. The Elastic Rerank model is now an additional lever to enhance search relevance in place of semantic search, giving you the flexibility to optimize for both relevance and budget.
First made available on serverless, but now available in tech preview in 8.17 for Elasticsearch, the benefits of our model exceed those of other models in the market today.
Performant and Efficient: The Elastic Rerank model outperforms other significantly larger reranking models. Built on the DeBERTa v3 architecture, it has been fine-tuned by distillation on a diverse dataset. Our detailed testing shows a 40% uplift on a broad range of retrieval tasks and up to 90% on question answering data sets.
As a comparison, the Elastic Rerank model is significantly better or comparable in terms of relevance even with much larger models. In our testing a few models, such as bge-re-ranker-v2-gemma, came closest in relevance but are an order of magnitude larger in terms of parameter count. That being said, we provide integrations in our Open Inference API to enable access to other third party rerankers, so you can easily test and see for yourself.
Easy to use
Not only are the performance and cost characteristics of the Elastic Rerank model great, we have also made it really easy to use to improve the relevance for lexical search. We want to provide easy to use primitives that help you build effective search, quickly, and without having to make lots of decisions; from which models to use, to how to use them in your search pipeline. We make it easy to get started and to scale.
You can now use Elastic Rerank using the Inference API with the text_similiarity_reranker retriever. Once downloaded and deployed each search request can handle a full hybrid search query and rerank the resulting set in one simple _search query.
PUT _inference/rerank/elastic-rerank
{
"service": "elasticsearch",
"service_settings": {
"model_id": ".rerank-v1",
"num_allocations": 1,
"num_threads": 1
}
}
It’s really easy to integrate the Elastic Rerank model in your code, to combine different retrievers to combine hybrid search with reranking. Here is an example that uses ELSER for semantic search, RRF for hybrid search and the reranker to rank the results.
GET retrievers_example/_search
{
"retriever": {
"text_similarity_reranker": {
"retriever": {
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"sparse_vector": {
"field": "vector.tokens",
"inference_id": ".elser-2-elasticsearch",
"query": "Cobrai Kai was a homage to the greatest movie of all time!"
}
}
}
},
{
"knn": {
"field": "vector",
"query_vector": [
0.23,
0.67,
0.89
],
"k": 3,
"num_candidates": 5
}
}
],
"rank_window_size": 10,
"rank_constant": 1
}
},
"field": "text",
"inference_id": "elastic-rerank",
"inference_text": "Which show continues the awesomeness of Karate Kid, the 1984 movie?"
}
},
"_source": ["text", "topic"]
}
If you have a fun dataset like mine that combines the love of AI with Cobrai Kai you will get something meaningful.
Summary
- English only cross-encoder model
- Semantic Boost your Keyword Search with little to no changes how data is indexed and searched already
- More control and flexibility over the cost of semantic boosting decoupled from indexing and search
- Reuse the data you already have in Elasticsearch
- Delivers significant improvements in relevance and performance (40% better on average for a large range of retrieval tasks and up to 90% better on question answering tasks as compared to significantly larger models, tested with over 21 datasets with an average of +13 points nDCG@10 improvement)
- Easy-to-use, out-of-the-box; built into the Elastic Inference API, easy to load and use in search pipelines
- Available in technical preview on across our product suite, easiest way to get started is on Elasticsearch Serverless
If you want to read all the details of how we built this, head over to our blog on Search Labs.
Ready to try this out on your own? Start a free trial.
Looking to build RAG into your apps? Want to try different LLMs with a vector database?
Check out our sample notebooks for LangChain, Cohere and more on Github, and join Elasticsearch Relevance Engine training now.
Related content
January 7, 2025
Early termination in HNSW for faster approximate KNN search
Learn how HNSW can be made faster for KNN search, using smart early termination strategies.
December 19, 2024
Understanding optimized scalar quantization
In this post we explain a new form of scalar quantization we've developed at Elastic that achieves state-of-the-art accuracy for binary quantization
December 5, 2024
Exploring depth in a 'retrieve-and-rerank' pipeline
Select an optimal re-ranking depth for your model and dataset.
November 25, 2024
Introducing Elastic Rerank: Elastic's new semantic re-ranker model
Learn about how Elastic's new re-ranker model was trained and how it performs.
November 18, 2024
Better Binary Quantization vs. Product Quantization
Why we chose to spend time working on better binary quantization instead of product quantization in Lucene and Elasticsearch.