In this article, we are going to explore how to use the Llamaindex RankGPT Reranker and the built-in Elasticsearch semantic reranker. Elastic provides an out-of-the-box experience to deploy and use rerankers as part of the retrievers pipeline in a scalable way without additional effort.
Originally, reranking in Elasticsearch required multiple steps, but now it’s integrated directly into the retrievers pipeline: the first stage runs the search query, and the second stage reranks the results, as shown in the image below:

What is reranking?
Reranking is the process of using an expensive mechanism to push the most relevant documents to the top of the results after retrieving a set of documents that are relevant to the user query.
There are many strategies to rerank documents using specialized cross-encoder models, like the Elastic Rerank mode, or Cross encoder for MS Marco (cross-encoder/ms-marco-MiniLM-L6-v2). Other approaches involve using an LLM for reranking. One of the advantages of the Elastic Rerank model is that it can be used as part of a semantic search pipeline or as a standalone tool to improve existing BM25 scoring systems.
A reranker needs a list of candidates and a user query to reorganize the candidates from most to least relevant based on the user query.
In this article, we will explore the Llamaindex RankGPT Reranker, which is a RankGPT reranker implementation, and the Elastic Semantic Reranker, using the Elastic Rerank model.
The complete example is available in this notebook.
Steps
Products index
Let’s create a reranker for laptops based on a user’s question. If a user is a hardcore gamer, they should get the most powerful machines. If they are a student, they might be okay with the lighter ones.
Let’s start with creating some documents in our Notebook:
products = [
{
"name": "ASUS ROG Strix G16",
"description": "Powerful gaming laptop with Intel Core i9 and RTX 4070.",
"price": 1899.99,
"reviews": 4.7,
"sales": 320,
"features": [
"Intel Core i9",
"RTX 4070",
"16GB RAM",
"512GB SSD",
"165Hz Display",
],
},
{
"name": "Razer Blade 15",
"description": "Premium gaming laptop with an ultra-slim design and high refresh rate." ,
"price": 2499.99,
"reviews": 4.6,
"sales": 290,
"features": [
"Intel Core i7",
"RTX 4060",
"16GB RAM",
"1TB SSD",
"240Hz Display",
],
},
{
"name": "Acer Predator Helios 300",
"description": "Affordable yet powerful gaming laptop with RTX graphics.",
"price": 1399.99,
"reviews": 4.5,
"sales": 500,
"features": [
"Intel Core i7",
"RTX 3060",
"16GB RAM",
"512GB SSD",
"144Hz Display",
],
},
{
"name": "MSI Stealth 17",
"description": "High-performance gaming laptop with a 17-inch display.",
"price": 2799.99,
"reviews": 4.8,
"sales": 200,
"features": ["Intel Core i9", "RTX 4080", "32GB RAM", "1TB SSD", "4K Display"],
},
{
"name": "Dell XPS 15",
"description": "Sleek and powerful ultrabook with a high-resolution display.",
"price": 2199.99,
"reviews": 4.7,
"sales": 350,
"features": [
"Intel Core i7",
"RTX 3050 Ti",
"16GB RAM",
"1TB SSD",
"OLED Display",
],
},
{
"name": "HP Omen 16",
"description": "Gaming laptop with a balanced price-to-performance ratio.",
"price": 1599.99,
"reviews": 4.4,
"sales": 280,
"features": [
"AMD Ryzen 7",
"RTX 3060",
"16GB RAM",
"512GB SSD",
"165Hz Display",
],
},
{
"name": "Lenovo Legion 5 Pro",
"description": "Powerful Ryzen-powered gaming laptop with high refresh rate.",
"price": 1799.99,
"reviews": 4.6,
"sales": 400,
"features": [
"AMD Ryzen 9",
"RTX 3070 Ti",
"16GB RAM",
"1TB SSD",
"165Hz Display",
],
},
{
"name": "MacBook Pro 16",
"description": "Apple's most powerful laptop with M3 Max chip.",
"price": 3499.99,
"reviews": 4.9,
"sales": 500,
"features": [
"Apple M3 Max",
"32GB RAM",
"1TB SSD",
"Liquid Retina XDR Display",
],
},
{
"name": "Alienware m18",
"description": "High-end gaming laptop with extreme performance.",
"price": 2999.99,
"reviews": 4.8,
"sales": 150,
"features": [
"Intel Core i9",
"RTX 4090",
"32GB RAM",
"2TB SSD",
"480Hz Display",
],
},
{
"name": "Samsung Galaxy Book3 Ultra",
"description": "Ultra-lightweight yet powerful laptop with AMOLED display.",
"price": 2099.99,
"reviews": 4.5,
"sales": 180,
"features": [
"Intel Core i7",
"RTX 4070",
"16GB RAM",
"512GB SSD",
"AMOLED Display",
],
},
{
"name": "Microsoft Surface Laptop 5",
"description": "Sleek productivity laptop with great battery life.",
"price": 1699.99,
"reviews": 4.3,
"sales": 220,
"features": ["Intel Core i7", "16GB RAM", "512GB SSD", "Touchscreen"],
},
{
"name": "Gigabyte AORUS 17",
"description": "Performance-focused gaming laptop with powerful cooling.",
"price": 1999.99,
"reviews": 4.6,
"sales": 250,
"features": [
"Intel Core i9",
"RTX 4070",
"16GB RAM",
"1TB SSD",
"360Hz Display",
],
},
]
User question
Let's define the question we are going to use to rerank the results.
user_query = "Best laptops for gaming"
LlamaIndex reranking
Install dependencies and import packages
We install all the dependencies needed to execute the RankGPT reranker of Llamaindex and Elasticsearch for document retrieval. Then, we load the laptops into an ElasticsearchStore
, which is the LlamaIndex abstraction for the Elasticsearch vector database, and retrieve them using the VectorStoreIndex class.
pip install llama-index-core llama-index-llms-openai rank-llm llama-index-postprocessor-rankgpt-rerank llama-index-vector-stores-elasticsearch elasticsearch -q
import os
import nest_asyncio
from getpass import getpass
from llama_index.vector_stores.elasticsearch import ElasticsearchStore
from llama_index.core import (
Document,
VectorStoreIndex,
QueryBundle,
Settings,
StorageContext,
)
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.postprocessor.rankgpt_rerank import RankGPTRerank
from llama_index.llms.openai import OpenAI
from elasticsearch import Elasticsearch
nest_asyncio.apply()
Setup keys
os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"
os.environ["ELASTICSEARCH_ENDPOINT"] = "ELASTIC_ENDPOINT"
os.environ["ELASTICSEARCH_API_KEY"] = "ELASTIC_API_KEY"
INDEX_NAME = "products-laptops"
Elasticsearch client
We instantiate the Elasticsearch client to index documents and run queries against our cluster.
_client = Elasticsearch(
os.environ["ELASTICSEARCH_ENDPOINT"],
api_key=os.environ["ELASTICSEARCH_API_KEY"],
)
Mappings
We are going to use regular text fields for full-text search and also create a semantic_field
with a copy of all the content so we can run semantic and hybrid queries. In Elasticsearch 8.18+, an inference endpoint will be deployed automatically.
# Creating mapping for the index
try:
_client.indices.create(
index=INDEX_NAME,
body={
"mappings": {
"properties": {
"metadata": {
"properties": {
"name": {"type": "text", "copy_to": "semantic_field"},
"description": {
"type": "text",
"copy_to": "semantic_field",
},
"price": {
"type": "float",
},
"reviews": {
"type": "float",
},
"sales": {"type": "integer"},
"features": {
"type": "keyword",
"copy_to": "semantic_field",
},
}
},
"semantic_field": {"type": "semantic_text"},
"text": {
"type": "text"
}, # Field to store the text content for LlamaIndex
"embeddings": {"type": "dense_vector", "dims": 512},
}
}
},
)
print("index created successfully")
except Exception as e:
print(
f"Error creating inference endpoint: {e.info['error']['root_cause'][0]['reason'] }"
)
Indexing data to LlamaIndex
Create an ElasticsearchStore
from the array of products we defined. This will create an Elasticsearch vector store that we can consume later using VectorStoreIndex
.
document_objects = []
es_store = ElasticsearchStore(
es_url=os.environ["ELASTICSEARCH_ENDPOINT"],
es_api_key=os.environ["ELASTICSEARCH_API_KEY"],
index_name=INDEX_NAME,
embedding_field="embeddings",
text_field="text",
)
storage_context = StorageContext.from_defaults(vector_store=es_store)
for doc in products:
text_content = f"""
Product Name: {doc["name"]}
Description: {doc["description"]}
Price: ${doc["price"]}
Reviews: {doc["reviews"]} stars
Sales: {doc["sales"]} units sold
Features: {', '.join(doc["features"])}
"""
metadata = {
"name": doc["name"],
"description": doc["description"],
"price": doc["price"],
"reviews": doc["reviews"],
"sales": doc["sales"],
"features": doc["features"],
}
document_objects.append(Document(text=text_content, metadata=metadata))
index = VectorStoreIndex([], storage_context=storage_context)
for doc in document_objects:
index.insert(doc)
LLM setup
Define the LLM that will work as a reranker:
Settings.llm = OpenAI(temperature=0, model="gpt-4.1-mini")
Settings.chunk_size = 512
Rerank feature
We now create a function that executes a retriever to get the most similar documents to the user question from the vector index, then applies a RankGPTRerank reranking on top and finally returns the documents reordered.
def get_retrieved_nodes(
query_str, vector_top_k=10, reranker_top_n=5, with_reranker=False
):
query_bundle = QueryBundle(query_str)
# configure retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=vector_top_k,
)
retrieved_nodes = retriever.retrieve(query_bundle)
if with_reranker:
# configure reranker
reranker = RankGPTRerank(
llm=OpenAI(
model="gpt-4.1-mini",
temperature=0.0,
api_key=os.environ["OPENAI_API_KEY"],
),
top_n=reranker_top_n,
verbose=True,
)
retrieved_nodes = reranker.postprocess_nodes(retrieved_nodes, query_bundle)
return retrieved_nodes
We also create a function to format the resulting documents.
def visualize_retrieved_nodes(nodes):
formatted_results = []
for node in nodes:
text = node.node.get_text()
product_name = text.split("Product Name:")[1].split("\n")[0].strip()
price = text.split("Price:")[1].split("\n")[0].strip()
reviews = text.split("Reviews:")[1].split("\n")[0].strip()
features = text.split("Features:")[1].strip()
formatted_result = f"{price} - {product_name} ({reviews}) {features}"
formatted_results.append(formatted_result)
return formatted_results
Without rerank
We first run the request without reranking.
new_nodes = get_retrieved_nodes(
query_str=user_query,
vector_top_k=5,
with_reranker=False,
)
results = visualize_retrieved_nodes(new_nodes)
print("\nTop 5 results without rerank:")
for idx, result in enumerate(results, start=1):
print(f"{idx}. {result}")
Answer:
Top 5 results without rerank:
1. $2499.99 - Razer Blade 15 (4.6 stars) Intel Core i7, RTX 4060, 16GB RAM, 1TB SSD, 240Hz Display
2. $1899.99 - ASUS ROG Strix G16 (4.7 stars) Intel Core i9, RTX 4070, 16GB RAM, 512GB SSD, 165Hz Display
3. $1999.99 - Gigabyte AORUS 17 (4.6 stars) Intel Core i9, RTX 4070, 16GB RAM, 1TB SSD, 360Hz Display
4. $2799.99 - MSI Stealth 17 (4.8 stars) Intel Core i9, RTX 4080, 32GB RAM, 1TB SSD, 4K Display
5. $2999.99 - Alienware m18 (4.8 stars) Intel Core i9, RTX 4090, 32GB RAM, 2TB SSD, 480Hz Display
With rerank
Now we enable reranking, which will execute the same vector search and then rerank the results using an LLM by applying the Best laptops for gaming
criteria to the top 5 results. We can see subtle differences, like the Intel Core i7
processor being pushed to the bottom and the Alienware m18
being promoted to position 2.
new_nodes = get_retrieved_nodes(
user_query,
vector_top_k=5,
reranker_top_n=5,
with_reranker=True,
)
results = visualize_retrieved_nodes(new_nodes)
print("\nTop 5 results with reranking:")
for idx, result in enumerate(results, start=1):
print(f"{idx}. {result}")
Answer:
Top 5 results with reranking:
1. $1899.99 - ASUS ROG Strix G16 (4.7 stars) Intel Core i9, RTX 4070, 16GB RAM, 512GB SSD, 165Hz Display
2. $2999.99 - Alienware m18 (4.8 stars) Intel Core i9, RTX 4090, 32GB RAM, 2TB SSD, 480Hz Display
3. $2799.99 - MSI Stealth 17 (4.8 stars) Intel Core i9, RTX 4080, 32GB RAM, 1TB SSD, 4K Display
4. $1999.99 - Gigabyte AORUS 17 (4.6 stars) Intel Core i9, RTX 4070, 16GB RAM, 1TB SSD, 360Hz Display
5. $2499.99 - Razer Blade 15 (4.6 stars) Intel Core i7, RTX 4060, 16GB RAM, 1TB SSD, 240Hz Display
Elasticsearch semantic reranking
Inference rerank endpoint
Create an inference endpoint that we can call in a standalone fashion to re-rank a list of candidates based on a query or when used as part of a retriever:
INFERENCE_RERANK_NAME = "my-elastic-rerank"
try:
_client.options(
request_timeout=60, max_retries=3, retry_on_timeout=True
).inference.put(
task_type="rerank",
inference_id=INFERENCE_RERANK_NAME,
body={
"service": "elasticsearch",
"service_settings": {
"model_id": ".rerank-v1",
"num_threads": 1,
"adaptive_allocations": {
"enabled": True,
"min_number_of_allocations": 1,
"max_number_of_allocations": 4,
},
},
},
)
print("Inference endpoint created successfully.")
except Exception as e:
print(
f"Error creating inference endpoint: {e.info['error']['root_cause'][0]['reason'] }"
)
We define a function to execute search queries and then parse the hit back.
async def es_search(query):
response = _client.search(index=INDEX_NAME, body=query)
hits = response["hits"]["hits"]
if not hits:
return ""
return hits
As with LlamaIndex, we create a function to format the resulting documents.
def format_es_results(hits):
formatted_results = []
for hit in hits:
metadata = hit["_source"]["metadata"]
name = metadata.get("name")
price = metadata.get("price")
reviews = metadata.get("reviews")
features = metadata.get("features")
formatted_result = f"{price} - {name} ({reviews}) {features}"
formatted_results.append(formatted_result)
return formatted_results
Semantic query
We will start with a semantic query to return the most similar results to the user’s question.
semantic_results = await es_search(
{
"size": 5,
"query": {
"semantic": {
"field": "semantic_field",
"query": user_query,
}
},
"_source": {
"includes": [
"metadata",
]
},
}
)
semantic_formatted_results = format_es_results(semantic_results)
print("Query results:")
for idx, result in enumerate(semantic_formatted_results, start=1):
print(f"{idx}. {result}")
Query results:
1. 2999.99 - Alienware m18 (4.8) ['Intel Core i9', 'RTX 4090', '32GB RAM', '2TB SSD', '480Hz Display']
2. 2799.99 - MSI Stealth 17 (4.8) ['Intel Core i9', 'RTX 4080', '32GB RAM', '1TB SSD', '4K Display']
3. 1599.99 - HP Omen 16 (4.4) ['AMD Ryzen 7', 'RTX 3060', '16GB RAM', '512GB SSD', '165Hz Display']
4. 1399.99 - Acer Predator Helios 300 (4.5) ['Intel Core i7', 'RTX 3060', '16GB RAM', '512GB SSD', '144Hz Display']
5. 1999.99 - Gigabyte AORUS 17 (4.6) ['Intel Core i9', 'RTX 4070', '16GB RAM', '1TB SSD', '360Hz Display']
rerank_results = await es_search(
{
"size": 5,
"_source": {
"includes": [
"metadata",
]
},
"retriever": {
"text_similarity_reranker": {
"retriever": {
"standard": {
"query": {
"semantic": {
"field": "semantic_field",
"query": user_query,
}
}
}
},
"field": "semantic_field",
"inference_id": INFERENCE_RERANK_NAME,
"inference_text": "reorder by quality-price ratio",
"rank_window_size": 5,
}
},
}
)
rerank_formatted_results = format_es_results(rerank_results)
print("Query results:")
for idx, result in enumerate(rerank_formatted_results, start=1):
print(f"{idx}. {result}")
Query result:
1. 1399.99 - Acer Predator Helios 300 (4.5) ['Intel Core i7', 'RTX 3060', '16GB RAM', '512GB SSD', '144Hz Display']
2. 2999.99 - Alienware m18 (4.8) ['Intel Core i9', 'RTX 4090', '32GB RAM', '2TB SSD', '480Hz Display']
3. 2799.99 - MSI Stealth 17 (4.8) ['Intel Core i9', 'RTX 4080', '32GB RAM', '1TB SSD', '4K Display']
4. 1999.99 - Gigabyte AORUS 17 (4.6) ['Intel Core i9', 'RTX 4070', '16GB RAM', '1TB SSD', '360Hz Display']
5. 1599.99 - HP Omen 16 (4.4) ['AMD Ryzen 7', 'RTX 3060', '16GB RAM', '512GB SSD', '165Hz Display']
In the following table, we can see a position comparison across the different tests:
Laptop model | Llama (no rerank) | Llama (with rerank) | Elastic (no rerank) | Elastic (with rerank) |
---|---|---|---|---|
Razer Blade 15 | 1 | 5 | - | - |
ASUS ROG Strix G16 | 2 | 1 | - | - |
Gigabyte AORUS 17 | 3 | 4 | 5 | 4 |
MSI Stealth 17 | 4 | 3 | 2 | 3 |
Alienware m18 | 5 | 2 | 1 | 2 |
HP Omen 16 | - | - | 3 | 5 |
Acer Predator Helios 300 | - | - | 4 | 1 |
Legend: A dash (-) indicates the item did not appear in the top 5 for that method.
It maintains consistency by keeping high-end laptops, like the Alienware m18 and MSI Stealth 17, in the top positions—just like LlamaIndex reranking—while achieving a better quality-price balance.
Conclusion
Rerankers are a powerful tool to increase the quality of our search systems and ensure we always retrieve the most important information for each user’s question.
LlamaIndex offers a variety of reranker strategies using specialized models, or LLMs. In their simplest implementation, you can create an in-memory vector store and store your documents locally, then retrieve and rerank, or use Elasticsearch as the vector store for persistence.
Elasticsearch, on the other hand, provides an out-of-the-box inference endpoints framework where you can use rerankers as part of the retrieval pipeline or as a standalone endpoint. You can also choose from many providers like Elastic itself, Cohere, Jina, or Alibaba, or deploy any third-party compatible model. With the simplest implementation of Elasticsearch, both your documents and your reranking model live on your Elasticsearch cluster, allowing you to scale.
Ready to try this out on your own? Start a free trial.
Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our Beyond RAG Basics webinar to build your next GenAI app!
Related content

July 22, 2025
Building Elasticsearch APIs with FastAPI
Learn how to build an Elasticsearch API with FastAPI using Pydantic schemas and FastAPI background tasks, demonstrated with a practical example.

July 21, 2025
Elastic Cloud Made Simple: GCP Marketplace
Learn how to set up and deploy Elastic Cloud using Google Cloud Marketplace.

July 18, 2025
Elasticsearch rename index
Learn how to rename an index in Elasticsearch using four practical methods.

July 16, 2025
AI-powered dashboards: From a vision to Kibana
Generate a dashboard using an LLM to process an image and turn it into a Kibana Dashboard.

July 15, 2025
Elasticsearch made simple: GCP Google Compute Engine
Learn how to set up an Elasticsearch deployment on a Google Compute Engine VM instance with Kibana for search capabilities.