Apache Lucene has seen significant activity in 2024, with numerous releases including the first major update in three years, packed with exciting improvements and new features. Let’s explore some of the key highlights.
Community
A project is only as strong as the community that supports it. Despite more than 20 years of development, the Lucene project remains vibrant and thrives thanks to its passionate and active contributors.
In 2024, the Lucene project has seen more than 2,000 commits from 98 unique contributors, and almost 800 pull requests. The number of contributors continues to grow, with new committers and PMC members joining the project and helping drive its success.
Lucene 10
2024 saw the first major release in almost 3 years - Lucene 10, with more than 2,000 commits from 185 unique contributors. While the development model that Lucene follows allows to deliver many improvements and features in minor releases, a major release affords the opportunity to bring larger features and modernizations. For example, Lucene 10 requires a minimum of Java 21. Bumping the minimum Java version ensures that Lucene can continue to take advantage of improvements that modern Java provides.
The primary focus of Lucene 10 is to better utilize the hardware on which it runs. Let's take a quick look at some of the main highlights:
- More search parallelism - while search execution is already parallelized across segments, we now go further, parallelizing within segments. This decouples on-disk representation from the execution performance, allowing even single segments to benefit from the number of cores on modern systems.
- Better I/O parallelism - the straightforward synchronous I/O model that Lucene uses has been enhanced with a prefetch stage. This informs the OS that a region of an index file will be needed in the very near future, while not blocking the calling thread.
- Better CPU and storage efficiency with sparse indexing - Lucene 10 introduces support for sparse indexing, sometimes called primary-key indexing or zone indexing in other data stores.
For more information about Lucene 10, check out the dedicated article on Lucene 10.
Research and innovation
In 2024, Lucene has seen a surge of research and innovation, particularly in the areas of machine learning integration, vector search, and optimization for large-scale datasets, with reference form 10 separate research papers and publications. Some of the key research areas and developments include:
- Vector Search and Embedding Support - Lucene provides a powerful and scalable solution for vector-based search, enabling semantic retrieval at scale. By leveraging Lucene's robust indexing and search infrastructure, users can combine the best of traditional text search with the advanced capabilities of modern vector search, making Lucene a comprehensive solution for a wide range of search and information retrieval tasks.
- Hybrid Search Models - Research has also delved into hybrid search techniques, where Lucene combines traditional keyword-based search with modern vector-based retrieval. By merging term-based indexes with dense vector representations, Lucene can deliver more accurate and contextually relevant search results, bridging the gap between the precision of traditional search engines and the flexibility of semantic search.
The ongoing research efforts in 2024 demonstrate Lucene’s adaptability to the evolving needs of modern search technologies, particularly in the context of AI, semantic search, and big data applications. The project continues to grow as a powerful, flexible, and efficient platform for both traditional and cutting-edge search use cases.
So many releases
Although not an exact reflection, the sheer volume of releases highlights the ongoing dedication and energy of the community. These updates include major enhancements to vector search performance and efficiency, support for madvise, optimizations for postings list decoding, further speed improvements through SIMD, and much more.
Here’s the full list of releases:
- 10.1.0 (2024-12-20)
- 9.12.1 (2024-12-13)
- 10.0.0 (2024-10-14)
- 9.12.0 (2024-09-28)
- 8.11.4 (2024-09-24)
- 9.11.1 (2024-06-27)
- 9.11.0 (2024-06-06)
- 9.10.0 (2024-02-20)
- 8.11.3 (2024-02-08)
- 9.9.2 (2024-01-29)
You can find more information and release notes at the Lucene Core page. Additionally, there are equivalent PyLucene releases.
Wrapping up
As Lucene matures, it continues to flourish thanks to its dedicated and vibrant community. As we’ve seen, 2024 has been an incredibly productive year, and we now look ahead to the exciting developments that 2025 will bring.
Ready to try this out on your own? Start a free trial.
Elasticsearch and Lucene offer strong vector database and search capabilities. Dive into our sample notebooks to learn more.
Related content
January 7, 2025
Early termination in HNSW for faster approximate KNN search
Learn how HNSW can be made faster for KNN search, using smart early termination strategies.
January 6, 2025
Optimized Scalar Quantization: Even Better Binary Quantization
Here we explain optimized scalar quantization in Elasticsearch and how we used it to improve Better Binary Quantization (BBQ).
December 27, 2024
Lucene bug adventures: Fixing a corrupted index exception
Sometimes, a single line of code takes days to write. Here, we get a glimpse of an engineer's pain and debugging over multiple days to fix a potential Apache Lucene index corruption.
December 4, 2024
Smokin' fast BBQ with hardware accelerated SIMD instructions
How we optimized vector comparisons in BBQ with hardware accelerated SIMD (Single Instruction Multiple Data) instructions.
November 18, 2024
Better Binary Quantization vs. Product Quantization
Why we chose to spend time working on better binary quantization instead of product quantization in Lucene and Elasticsearch.