Elasticsearch now with BBQ by default & ACORN for filtered vector search

Explore how Elasticsearch's vector search now delivers better results faster, and at a lower cost.

In the world of AI-powered search, three things matter above all else: query speed, ranking accuracy, and the cost of the resources required to achieve them. At Elastic, we're constantly pushing the boundaries on all three fronts. I'm excited to share two recent dense vector search improvements in Elasticsearch 9.1: a novel algorithm named ACORN for faster filtered vector search, and new evidence showing our default quantization method, BBQ, not only reduces costs but can actually improve ranking quality.

Real-world search is rarely a simple "find me things like this." It’s "find me products like this in my size," or "find me documents similar to this one from last quarter." Even avoiding deleted documents is a filter, so in practice, filtered search is a very common scenario.

Making filtered vector search fast without compromising accuracy is a deep technical challenge. The HNSW algorithm is based on a graph search in which nodes represent vectors. The graphs are constructed during indexing and not per query. When supporting pre-filter, for accurate result sets, as Elasticsearch does, the naive method for filtered search is to traverse the graph and only collect nodes that pass the filter. The problem with that naive approach is that nodes are being evaluated even if they do not pass the filter in order to continue the search to their neighboring nodes and to the rest of the graph, resulting in a slower search. This phenomenon is significant in restrictive filters, when the majority of documents do not pass the filter. We had a partial solution for that even before, but we thought we could do better.

We chose to use the ACORN-1 (ANN Constraint-Optimized Retrieval Network), a new algorithm described in an academic article published in 2024, for performing filtered k-Nearest Neighbor (kNN) search. ACORN works by integrating the filtering process directly into the HNSW graph traversal. With ACORN-1, only nodes that are accepted by the filter are evaluated, but to reduce the chances of missing relevant sections of the graph, the second-level neighbors, i.e., the neighbors of the neighbors, are also evaluated (if the filter accepts them). The implementation in Lucene used by Elasticsearch includes certain heuristics to further improve the results (detailed in this blog).

When choosing the specific algorithm and implementation, we explored other alternatives that have theoretical potential to even further improve latency, and decided not to pursue them because they force the user to declare prior to indexing the fields that will be used for filtering. We value the flexibility to define the filtering fields after documents are ingested because in real life indices evolve. The potential minor gain did not justify the loss of flexibility and we can gain the performance using other means, like BBQ (see below).

The results are a step-change in performance. We've measured typical speedups of 5x, with some highly selective filters showing much higher improvement. This is a massive enhancement for the complex, real-world queries. All you need to do to benefit from ACORN-1 is to perform filtered vector queries in Elasticsearch.

BBQ: The surprising superpower of better ranking

Reducing the memory footprint of vectors is crucial for building scalable, cost-effective AI systems. Our Better Binary Quantization (BBQ) method achieves this by compressing high-dimensional float vectors by a factor of ~32x. We've previously discussed how BBQ is superior to other techniques like Product Quantization in terms of recall, latency, and cost in our Search Labs blogs:

  1. Better Binary Quantization (BBQ) in Lucene and Elasticsearch
  2. How to implement Better Binary Quantization (BBQ) into your use case and why you should
  3. Better Binary Quantization (BBQ) vs. Product Quantization

The intuitive assumption, however, is that such a dramatic, lossy compression must come at the cost of ranking quality. Our recent extensive benchmark consistently shows we can completely compensate for that by exploring greater parts of the graph and reranking, and so achieve improvement not only in cost and performance, but also in relevance ranking.

BBQ isn't just a compression trick; it’s a sophisticated two-stage search process:

  1. Broad scan: First, it uses the tiny, compressed vectors to rapidly scan the corpus and identify a set of top-ranking documents. The size of the set is bigger than the number of top-ranking documents that the user requests (oversampling).
  2. Precise reranking: Then, it takes the top candidates from this initial scan and reranks them using their original, full-precision float32 vectors to determine the final order.

This process of oversampling and reranking acts as a powerful corrective, often finding highly relevant results that a pure float32 HNSW search, in its more limited traversal of the graph, might have missed.

The proof is in the ranking

To measure this, we used NDCG@10 (Normalized Discounted Cumulative Gain at 10), a standard metric that evaluates the quality of the top 10 search results. A higher NDCG score means more relevant documents are ranked higher. You can learn more about this in our ranking evaluation API documentation.

We ran several benchmarks across 10 public datasets from the BEIR data sets, comparing traditional BM25 search, vector search with the e5-small model (float32 vectors), and vector search with the same model using BBQ. We chose e5-small because we know from previous benchmarks that BBQ is at its best with high vector dimensionality, so we wanted to benchmark it where it struggles, with rather low-dimensional vectors like e5-small produces. We measured using NDCG@10, which is a ranking quality metric that takes into account the order of the results. Below is a representative example of the results we saw.

Data setBM25e5-small float32e5-small BBQ with defaults
Climate-FEVER0.1430.19980.2059
DBPedia0.3060.31430.3414
FiQA-20180.2510.30020.3136
Natural Questions0.2920.48990.5251
NFCorpus0.3210.29280.3067
Quora0.8080.85930.8765
SCIDOCS0.1550.13510.1381
SciFact0.6830.65690.677
Touché-20200.3370.20960.2089
TREC-COVID0.6150.71220.7189

The results are stunning. BBQ achieved better ranking quality than pure float32 search in 9 out of the 10 datasets. Even in the single exception, the difference was negligible. Furthermore, BBQ was the top-performing method overall in 6 of the 10 sets. As a sidenote, often the best ranking is obtained by using a hybrid of BM25 and vector queries, plus other factors like distance, time, and popularity, and Elasticsearch excels in these types of queries.

One may ask how BBQ provides better ranking than float32, since BBQ is a lossy compression. Sure with BBQ we oversample and rerank by the float32 vector, but that shouldn’t matter, since conceptually, with float32, we rank all the documents using the float32 vectors. The answer is that with HNSW we only evaluate a smaller number of vectors per shard, which is defined in the parameter we call num_candidates. The default num_candidates when no quantization is performed is 1.5*k, with k being the number of results returned to the user. You can read more about the benchmarks we performed to get to that default here. When we are quantizing to BBQ, the comparison of vectors is faster and we can afford a larger num_candidates while still reducing latency, so after some benchmarking work we set the default num_candidates for BBQ to max(1.5*k, oversample*k).

For the benchmark above, that relies on our defaults, and in which k=10, we calculate num_candidates as follows:

  • Float32: 1.5*k = 1.5*10 = 15
  • BBQ: max(1.5*k, oversample*k) = max(1.5*10, 3*10) = 30

Because of the difference in num_candidates, we scan a greater part of the HNSW graphs when using BBQ and get better candidates in the top 30 that are reranked by float32, which explains how BBQ provides better ranking quality. You can set num_candidates for your needs (e.g., see here), or, like most users, you can trust our benchmarks, and rely on the defaults.

This addresses a question I received from a consultant: "Is it better to use e5-large with BBQ or e5-small with float32?" The answer is now unequivocally clear. Using a more powerful model like e5-large with BBQ gives you the best of all worlds:

  • Better ranking: From the superior e5-large model, further enhanced by BBQ.
  • Lower latency: From the highly efficient BBQ search process.
  • Lower cost: From the 32x memory reduction.

It’s a win-win-win, demonstrating that you don't have to trade quality for cost.

Because of this proven superiority, we have made BBQ the default quantization method for dense vectors of 384 dimensions or higher in Elasticsearch 9.1. We recommend this for most modern embedding models, which tend to distribute vectors well across the available space, making them ideal for BBQ's approach.

Get started today

These advancements in ACORN and BBQ empower you to build more powerful, scalable, and cost-effective AI applications on Elastic. You can execute complex, filtered queries at high speed while simultaneously improving ranking relevance and dramatically reducing memory costs.

Upgrade to Elasticsearch 9.1 to take advantage of these new capabilities.

We handle the complexity so you can focus on building incredible search experiences. Happy searching.

Ready to try this out on your own? Start a free trial.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Related content

Vector search filtering: Keep it relevant

September 3, 2025

Vector search filtering: Keep it relevant

Performing vector search to find the most similar results to a query is not enough. Filtering is often needed to narrow down search results. This article explains how filtering works for vector search in Elasticsearch and Apache Lucene.

Lighter by default: Excluding vectors from source

Lighter by default: Excluding vectors from source

Elasticsearch now excludes vectors from source by default, saving space and improving performance while keeping vectors accessible when needed.

Introducing a more powerful, resilient, and observable ES|QL in Elasticsearch 8.19 & 9.1

July 29, 2025

Introducing a more powerful, resilient, and observable ES|QL in Elasticsearch 8.19 & 9.1

Exploring ES|QL enhancements in Elasticsearch 8.19 & 9.1, including built-in resilience to failures, new monitoring and observability capabilities, and more.

Unify your data: Cross-cluster search with ES|QL is now generally available!

July 29, 2025

Unify your data: Cross-cluster search with ES|QL is now generally available!

Cross-Cluster search with ES|QL is now GA! Query data across multiple clusters with a single, elegant query. Learn about its performance, resilience, and syntax.

LogsDB and TSDS performance and storage improvements in Elasticsearch 8.19.0 and 9.1.0

LogsDB and TSDS performance and storage improvements in Elasticsearch 8.19.0 and 9.1.0

Exploring the latest enhancements to TSDS and LogsDB, including optimizing I/O, improving merge performance, and more.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself