Lucene Wrapped 2024

2024 has been another major year for Apache Lucene. In this blog, we’ll explore the key highlights.

Apache Lucene has seen significant activity in 2024, with numerous releases including the first major update in three years, packed with exciting improvements and new features. Let’s explore some of the key highlights.

Community

A project is only as strong as the community that supports it. Despite more than 20 years of development, the Lucene project remains vibrant and thrives thanks to its passionate and active contributors.

In 2024, the Lucene project has seen more than 2,000 commits from 98 unique contributors, and almost 800 pull requests. The number of contributors continues to grow, with new committers and PMC members joining the project and helping drive its success.

Lucene 10

2024 saw the first major release in almost 3 years - Lucene 10, with more than 2,000 commits from 185 unique contributors. While the development model that Lucene follows allows to deliver many improvements and features in minor releases, a major release affords the opportunity to bring larger features and modernizations. For example, Lucene 10 requires a minimum of Java 21. Bumping the minimum Java version ensures that Lucene can continue to take advantage of improvements that modern Java provides.

The primary focus of Lucene 10 is to better utilize the hardware on which it runs. Let's take a quick look at some of the main highlights:

  • More search parallelism - while search execution is already parallelized across segments, we now go further, parallelizing within segments. This decouples on-disk representation from the execution performance, allowing even single segments to benefit from the number of cores on modern systems.
  • Better I/O parallelism - the straightforward synchronous I/O model that Lucene uses has been enhanced with a prefetch stage. This informs the OS that a region of an index file will be needed in the very near future, while not blocking the calling thread.
  • Better CPU and storage efficiency with sparse indexing - Lucene 10 introduces support for sparse indexing, sometimes called primary-key indexing or zone indexing in other data stores.

For more information about Lucene 10, check out the dedicated article on Lucene 10.

Research and innovation

In 2024, Lucene has seen a surge of research and innovation, particularly in the areas of machine learning integration, vector search, and optimization for large-scale datasets, with reference form 10 separate research papers and publications. Some of the key research areas and developments include:

  • Vector Search and Embedding Support - Lucene provides a powerful and scalable solution for vector-based search, enabling semantic retrieval at scale. By leveraging Lucene's robust indexing and search infrastructure, users can combine the best of traditional text search with the advanced capabilities of modern vector search, making Lucene a comprehensive solution for a wide range of search and information retrieval tasks.
  • Hybrid Search Models - Research has also delved into hybrid search techniques, where Lucene combines traditional keyword-based search with modern vector-based retrieval. By merging term-based indexes with dense vector representations, Lucene can deliver more accurate and contextually relevant search results, bridging the gap between the precision of traditional search engines and the flexibility of semantic search.

The ongoing research efforts in 2024 demonstrate Lucene’s adaptability to the evolving needs of modern search technologies, particularly in the context of AI, semantic search, and big data applications. The project continues to grow as a powerful, flexible, and efficient platform for both traditional and cutting-edge search use cases.

So many releases

Although not an exact reflection, the sheer volume of releases highlights the ongoing dedication and energy of the community. These updates include major enhancements to vector search performance and efficiency, support for madvise, optimizations for postings list decoding, further speed improvements through SIMD, and much more.

Here’s the full list of releases:

You can find more information and release notes at the Lucene Core page. Additionally, there are equivalent PyLucene releases.

Wrapping up

As Lucene matures, it continues to flourish thanks to its dedicated and vibrant community. As we’ve seen, 2024 has been an incredibly productive year, and we now look ahead to the exciting developments that 2025 will bring.

Ready to try this out on your own? Start a free trial.

Elasticsearch and Lucene offer strong vector database and search capabilities. Dive into our sample notebooks to learn more.

Related content

Early termination in HNSW for faster approximate KNN search

January 7, 2025

Early termination in HNSW for faster approximate KNN search

Learn how HNSW can be made faster for KNN search, using smart early termination strategies.

Optimized Scalar Quantization: Even Better Binary Quantization

January 6, 2025

Optimized Scalar Quantization: Even Better Binary Quantization

Here we explain optimized scalar quantization in Elasticsearch and how we used it to improve Better Binary Quantization (BBQ).

Lucene bug adventures: Fixing a corrupted index exception

December 27, 2024

Lucene bug adventures: Fixing a corrupted index exception

Sometimes, a single line of code takes days to write. Here, we get a glimpse of an engineer's pain and debugging over multiple days to fix a potential Apache Lucene index corruption.

Smokin' fast BBQ with hardware accelerated SIMD instructions

December 4, 2024

Smokin' fast BBQ with hardware accelerated SIMD instructions

How we optimized vector comparisons in BBQ with hardware accelerated SIMD (Single Instruction Multiple Data) instructions.

Better Binary Quantization vs. Product Quantization

November 18, 2024

Better Binary Quantization vs. Product Quantization

Why we chose to spend time working on better binary quantization instead of product quantization in Lucene and Elasticsearch.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself