Apache Lucene 10 is out! Improvements to Lucene's hardware efficiency & more

Apache Lucene 10 was just released, with a focus on hardware efficiency! Check out the main release highlights.

Apache Lucene 10 is finally out! With more than 2,000 commits from 185 unique contributors since Lucene 9.0 - which was released in December 2021, almost 3 years ago - a lot has been going on. To be fair, a majority of these changes have been delivered in 9.x minor releases. However, the most ambitious changes usually need a major version, such as the introduction of multi-dimensional points in Lucene 6.0, dynamic pruning in 8.0, or vector search in 9.0. In 10.0, the area of focus for Lucene has been hardware efficiency, ie. making Lucene better at taking advantage of modern hardware. Let me guide you through the main release highlights:

More search parallelism

For many years now, Lucene has had the ability to parallelize search execution, by creating groups of segments, searching each group in a different thread, and combining results in the end. One downside of this approach is that it couples the index geometry - how the index is organized into segments - and search parallelism. For instance, an index that has been force-merged down to a single segment can no longer take advantage of multiple execution threads to be searched. Quite disappointing when modern CPUs commonly have tens of cores!

To overcome this limitation, Lucene's query evaluation logic now allows splitting the index into logical partitions, that no longer need to be aligned with segments. For instance, an index that has been force-merged down to a single segment could still be sliced into 10 logical partitions that have one tenth of the documents of this segment each.

This change will help increase search parallelism, especially on machines that have many cores and/or on indexes that have few segments on their highest tier. This change doesn't work nicely with queries that have a high cost for creating a Scorer yet - such as range queries and prefix queries, but we're hoping to lift this limitation in upcoming minor releases.

Better I/O parallelism

Until now, Lucene would use synchronous I/O and perform at most one I/O operation at a time per search thread. For indexes that significantly exceed the size of the page cache, this could lead to queries being bound on I/O latency, while the host is still very far from maxing out IOPS. Frustrating!

To overcome this, Lucene's Directory abstraction introduced a new IndexInput#prefetch API, to let the OS know about regions of files that it is about to read. The OS can then parallelize retrieving pages that intersect with these regions, within a single OS thread. For instance, a BooleanQuery with TermQuery clauses would now perform the I/O of terms dictionary lookups in parallel and then retrieve the first couple pages of each postings list in parallel, within a single execution thread. MMapDirectory, Lucene's default Directory implementation, implements this prefetch API using madvise's MADV_WILLNEED advice on Linux and Mac OS.

We are very excited about this change, which has already proved to help on fast local NVMe disks, and will further help on storage systems that have worse latencies while retaining good parallelism such as network-attached disks (GCP persistent storage, Amazon EBS, Azure managed disks) or even object storage (GCP Cloud storage, Amazon S3, Azure blob storage).

Better CPU efficiency and storage efficiency with sparse indexing

Lucene 10 introduces support for sparse indexing, sometimes called primary-key indexing or zone indexing in other data stores. The idea is simple: if your data is stored on disk in sorted order, then you can organize it into blocks, record the minimum and maximum values per block, and your queries will be able to take advantage of this information to skip blocks that don't intersect with the query, or to fully match blocks that are contained by the query. Only blocks that partially intersect with the query will need further inspection, and the challenge consists of picking the best index sort that will minimize the number of such blocks.

Lucene's sparse indexes are currently implemented via 4 levels of blocks that have 4k, 32k, 256k and 2M docs each respectively.

When done right, this form of indexing is extremely space-efficient (only a few bytes per block) and CPU-efficient (can make a decision about whether thousands of documents match or not with only a few CPU instructions). The downside is that the index can only be stored in a single order on disk, so not all fields can benefit from it. Typically, the index would be sorted on the main dimensions of the data. For instance, for an e-commerce catalog containing products, these dimensions could be the category and the brand of the products.

Conclusion

Note that some hardware-efficiency-related changes have also been released in 9.x minor releases. In particular, it's worth highlighting that:

We are pretty excited about this new Lucene release and the hardware-efficiency focus. In case you are curious to learn more about these improvements, we will be writing more detailed blogs about them in the coming weeks. Stay tuned.

Ready to try this out on your own? Start a free trial.

Elasticsearch and Lucene offer strong vector database and search capabilities. Dive into our sample notebooks to learn more.

Related content

Apache Lucene 9.9, the fastest Lucene release ever

December 7, 2023

Apache Lucene 9.9, the fastest Lucene release ever

Lucene 9.9 brings major speedups to query evaluation. Here are the performance improvements observed in nightly benchmarks & optimization resources.

Elasticsearch vs. OpenSearch: Vector Search Performance Comparison

Elasticsearch vs. OpenSearch: Vector Search Performance Comparison

Elasticsearch is out-of-the-box 2x–12x faster than OpenSearch for vector search

Bringing maximum-inner-product into Lucene

September 1, 2023

Bringing maximum-inner-product into Lucene

Explore how we brought maximum-inner-product into Lucene and the investigations undertaken to ensure its support.

Understanding Int4 scalar quantization in Lucene

April 25, 2024

Understanding Int4 scalar quantization in Lucene

This blog explains how int4 quantization works in Lucene, how it lines up, and the benefits of using int4 quantization.

Making Lucene faster with vectorization and FFI/madvise

April 17, 2024

Making Lucene faster with vectorization and FFI/madvise

Discover how modern Java features, including vectorization and FFI/madvise, are speeding up Lucene's performance.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself