Semantic text is all that and a bag of (BBQ) chips! With configurable chunking settings and index options

With the general availability of semantic_text search in Elasticsearch 8.18 and 9.0, our focus has primarily centered on simplicity, enhancing speed and storage, and broadening usability to match, knn and sparse_vector queries. Now, it’s time to complete this story by offering additional customization for advanced and expert use cases. This includes specifying the quantization method to use for vectors (including customizing BBQ) or configuring customizable chunking settings to control how semantic_text breaks down long input into smaller pieces of data to send into inference models.

With this in mind, we have delivered additional features that allow finer control over both chunking settings and quantization configuration.

Customizing chunking settings

When we introduced semantic_text, one of the most powerful functionalities was that it handled chunking transparently for you, without any additional configuration at the field level. This was possible because each configured inference endpoint has chunking settings attached to it. When ingesting documents with semantic_text, long documents were automatically split into chunks based on that inference endpoint configuration.

While this default chunking configuration performs exceptionally well for most use cases, there are some use cases where more granular control over chunking is required. So, we introduced configurable chunking settings to semantic_text that, when set, will override the default behavior configured in the inference endpoint.

It’s easy to set this up through your mappings:

PUT my-index
{
  "mappings": {
    "properties": {
      "my_semantic_field": {
        "type": "semantic_text",
        "chunking_settings": {
          "strategy": "sentence",
          "max_chunk_size": 250,
          "sentence_overlap": 1
        }
      }
    }
  }
}

In this example, we are using a sentence based chunking strategy. This splits data into chunks that incorporate complete sentences, up to the length of max_chunk_size, which in this example is 250 words. We can also define a sentence_overlap, which can include the same sentence in multiple chunks. Additionally, we support word based chunking strategies, which split data by individual words rather than sentence boundaries, or none to completely disable chunking. This is useful if you perform chunking before sending data to Elasticsearch and want to preserve those chunks without further modification. More information on how chunking works can be found in our documentation and in this search labs blog.

Since chunking settings are used at request time, you can update the chunking settings at any time via a mapping update command:

PUT my-index/_mapping
{
  "properties": {
    "my_semantic_field": {
      "type": "semantic_text",
      "chunking_settings": {
        "strategy": "word",
        "max_chunk_size": 250,
        "overlap": 100
      }
    }
  }
}

Please note, however, that as with any other update to field mappings in Elasticsearch, this will not impact already-indexed documents. If you decide to change your chunking configuration and want all of your documents to reflect the updated chunking settings, you will have to reindex those documents.

“Bring your own chunks”

Another chunking configuration we’ve been asked for quite a bit is the ability to completely disable chunking in semantic_text fields, for users who want to apply their own chunking strategies before indexing documents into semantic_text. An example is if you have a model-based chunking strategy that splits Markdown or another type of data. In this scenario you would perform your chunking first, before you ingest content into Elasticsearch.

The good news is that this is now supported as well with the none chunking strategy:

PUT my-index
{
  "mappings": {
    "properties": {
      "my_semantic_field": {
        "type": "semantic_text",
        "chunking_settings": {
          "strategy": "none"
        }
      }
    }
  }
}

When using the none chunking strategy, you can index your pre-chunked data explicitly by sending in each chunk separately in an array:

PUT my-index/_doc/1
{
  "my_semantic_field": [
    "These aren't the droids you're looking for",
    "He's free to go around"
  ]
}

Providing pre-chunked input is an expert use case, and sizing your chunks based on the model’s token limit is important. Sending in chunks exceeding the model’s token limit may result in errors or document truncation, depending on the service and model used.

BBQ and other quantization configuration

Traditionally, semantic_text has relied on good default quantization strategies for text embedding models. This strategy gets an update in 8.19 and 9.1, where we will default new indices using semantic_text to our state-of-the-art BBQ HNSW quantization strategy for all compatible text embedding models. We’re confident that this is the right choice for most use cases because of how much better embedding models for text have been shown to perform using BBQ.

However, we realize that you may want to update these settings - maybe you’re not ready for BBQ, or you want to try out some even better improvements to quantization in the future. Well, we’ve got your back!

For text embedding models (dense vectors), we support setting these quantization index options using any supported dense_vector configuration. For example, here is a way to set an index to use bbq_flat (using a brute force search algorithm on top of binary quantization instead of the default HNSW strategy using BBQ):

PUT my-index
{
  "mappings": {
    "properties": {
      "my_semantic_field": {
        "type": "semantic_text",
        "inference_id": "my-text-embedding-model",
        "index_options": {
          "dense_vector": {
            "type": "bbq_flat"
           }
        }
      }
    }
  }
}

These index_options mappings are also updatable, provided that you are updating to a compatible quantization method.

More coming soon

You may have noticed that the index_options API specifies dense_vector index options for quantization. In the future, we will support additional index options for more than just text embedding models, such as sparse_vector.

Try it out yourself

With these exciting updates, semantic search in Elasticsearch using semantic_text is both easy to use out of the box and configurable for more expert use cases. These enhancements are native to Elasticsearch and will work with future improvements to both chunking and quantization, and are already available in Serverless! They’ll be available in stack-hosted Elasticsearch starting with version 8.19/9.1.

Try it out today!

Ready to try this out on your own? Start a free trial.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!