With the general availability of semantic_text
search in Elasticsearch 8.18 and 9.0, our focus has primarily centered on simplicity, enhancing speed and storage, and broadening usability to match
, knn
and sparse_vector
queries. Now, it’s time to complete this story by offering additional customization for advanced and expert use cases. This includes specifying the quantization method to use for vectors (including customizing BBQ) or configuring customizable chunking settings to control how semantic_text
breaks down long input into smaller pieces of data to send into inference models.
With this in mind, we have delivered additional features that allow finer control over both chunking settings and quantization configuration.
Customizing chunking settings
When we introduced semantic_text
, one of the most powerful functionalities was that it handled chunking transparently for you, without any additional configuration at the field level. This was possible because each configured inference endpoint has chunking settings attached to it. When ingesting documents with semantic_text
, long documents were automatically split into chunks based on that inference endpoint configuration.
While this default chunking configuration performs exceptionally well for most use cases, there are some use cases where more granular control over chunking is required. So, we introduced configurable chunking settings to semantic_text
that, when set, will override the default behavior configured in the inference endpoint.
It’s easy to set this up through your mappings:
PUT my-index
{
"mappings": {
"properties": {
"my_semantic_field": {
"type": "semantic_text",
"chunking_settings": {
"strategy": "sentence",
"max_chunk_size": 250,
"sentence_overlap": 1
}
}
}
}
}
In this example, we are using a sentence
based chunking strategy. This splits data into chunks that incorporate complete sentences, up to the length of max_chunk_size
, which in this example is 250 words. We can also define a sentence_overlap
, which can include the same sentence in multiple chunks. Additionally, we support word based chunking strategies, which split data by individual words rather than sentence boundaries, or none
to completely disable chunking. This is useful if you perform chunking before sending data to Elasticsearch and want to preserve those chunks without further modification. More information on how chunking works can be found in our documentation and in this search labs blog.
Since chunking settings are used at request time, you can update the chunking settings at any time via a mapping update command:
PUT my-index/_mapping
{
"properties": {
"my_semantic_field": {
"type": "semantic_text",
"chunking_settings": {
"strategy": "word",
"max_chunk_size": 250,
"overlap": 100
}
}
}
}
Please note, however, that as with any other update to field mappings in Elasticsearch, this will not impact already-indexed documents. If you decide to change your chunking configuration and want all of your documents to reflect the updated chunking settings, you will have to reindex those documents.
“Bring your own chunks”
Another chunking configuration we’ve been asked for quite a bit is the ability to completely disable chunking in semantic_text
fields, for users who want to apply their own chunking strategies before indexing documents into semantic_text
. An example is if you have a model-based chunking strategy that splits Markdown or another type of data. In this scenario you would perform your chunking first, before you ingest content into Elasticsearch.
The good news is that this is now supported as well with the none
chunking strategy:
PUT my-index
{
"mappings": {
"properties": {
"my_semantic_field": {
"type": "semantic_text",
"chunking_settings": {
"strategy": "none"
}
}
}
}
}
When using the none
chunking strategy, you can index your pre-chunked data explicitly by sending in each chunk separately in an array:
PUT my-index/_doc/1
{
"my_semantic_field": [
"These aren't the droids you're looking for",
"He's free to go around"
]
}
Providing pre-chunked input is an expert use case, and sizing your chunks based on the model’s token limit is important. Sending in chunks exceeding the model’s token limit may result in errors or document truncation, depending on the service and model used.
BBQ and other quantization configuration
Traditionally, semantic_text
has relied on good default quantization strategies for text embedding models. This strategy gets an update in 8.19 and 9.1, where we will default new indices using semantic_text
to our state-of-the-art BBQ HNSW quantization strategy for all compatible text embedding models. We’re confident that this is the right choice for most use cases because of how much better embedding models for text have been shown to perform using BBQ.
However, we realize that you may want to update these settings - maybe you’re not ready for BBQ, or you want to try out some even better improvements to quantization in the future. Well, we’ve got your back!
For text embedding models (dense vectors), we support setting these quantization index options using any supported dense_vector configuration. For example, here is a way to set an index to use bbq_flat
(using a brute force search algorithm on top of binary quantization instead of the default HNSW strategy using BBQ):
PUT my-index
{
"mappings": {
"properties": {
"my_semantic_field": {
"type": "semantic_text",
"inference_id": "my-text-embedding-model",
"index_options": {
"dense_vector": {
"type": "bbq_flat"
}
}
}
}
}
}
These index_options
mappings are also updatable, provided that you are updating to a compatible quantization method.
More coming soon
You may have noticed that the index_options
API specifies dense_vector
index options for quantization. In the future, we will support additional index options for more than just text embedding models, such as sparse_vector
.
Try it out yourself
With these exciting updates, semantic search in Elasticsearch using semantic_text
is both easy to use out of the box and configurable for more expert use cases. These enhancements are native to Elasticsearch and will work with future improvements to both chunking and quantization, and are already available in Serverless! They’ll be available in stack-hosted Elasticsearch starting with version 8.19/9.1.
Try it out today!
Ready to try this out on your own? Start a free trial.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!
Related content

July 10, 2025
Diversifying search results with Maximum Marginal Relevance
Implementing the Maximum Marginal Relevance (MMR) algorithm with Elasticsearch and Python. This blog includes code examples for vector search reranking.

May 28, 2025
Hybrid search revisited: introducing the linear retriever!
Discover how the linear retriever enhances hybrid search by leveraging weighted scores and MinMax normalization for more precise and consistent rankings. Learn how to configure this new tool for optimized search pipelines and improve your results today.

May 26, 2025
Creating Judgement Lists with Quepid
Creating judgement lists in Quepid with a collaborative human rater process.

May 20, 2025
Cracking the code on search quality: The role of judgment lists
Explore why a judgment list is essential, the different types of judgments, and the key factors that define search quality.

April 11, 2025
Enhancing relevance with sparse vectors
Learn how to use sparse vectors in Elasticsearch to boost relevance and personalize search results with minimal complexity.