RAG made easy with Spring AI + Elasticsearch

Customizing your AI chatbot experience with private data

Spring AI recently added Elasticsearch as a vector store, and the Elastic team contributed optimizations to it. We're excited to show how easy and intuitive it is to build a complete RAG application in Java using Spring AI and Elasticsearch vector database. Developers can now use Spring’s modular capabilities with Elasticsearch’s advanced retrieval and AI tools and rapidly build Spring Boot applications for enterprise use cases.

In this blog, we will build a Retrieval-Augmented Generation (RAG) Java application with Spring AI, using the new Elasticsearch vector store integration for document storage and retrieval. You'll learn how to configure a Maven project, set up all the necessary dependencies, and integrate Elasticsearch as a vector store. We will also guide you through reading and tokenizing a PDF document, sending it to Elasticsearch, and querying it using AI models to provide accurate and contextually relevant information. Let’s get going!

Disclaimer

The spring-ai-elasticsearch artifact is still in technical preview and only available in the Spring Milestones repository. For this reason, we discourage using the provided code in any production environment until the official release.

Prerequisites

  • Elasticsearch version >= 8.14.0
  • Java version >= 17
  • Any LLM supported by SpringAI (complete list)

Use case: Runewars

Runewars is a miniature game with a fairly complex set of rules explained in its 40 pages manual, and going back to play it after a few years since the last match means forgetting most of the rules. Let's try and ask ChatGPT (version GPT-4o) for some refreshers:

That's not just generic, it's wrong: reward cards must be hidden from other players. It's clear that it does not know the rules to this game, so let's augment the model with the rules!

Demo goal

Have an AI chat model able to answer questions relative to the Runewars rules, and provide the manual page where it found the information with the response. The code used to make all of this work is available on Github.

Project configuration

We're going to create a new Java project using Apache Maven as a build tool, so let's set up the POM accordingly, starting from the addition of the Milestones and Snapshot Spring repositories, as explained in Spring AI's getting started:

  <repositories>
    <repository>
      <id>spring-milestones</id>
      <name>Spring Milestones</name>
      <url>https://repo.spring.io/milestone</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>spring-snapshots</id>
      <name>Spring Snapshots</name>
      <url>https://repo.spring.io/snapshot</url>
      <releases>
        <enabled>false</enabled>
      </releases>
    </repository>
  </repositories>

We also need to import the Spring AI bom:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.0-M3</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

We're going to rely on Spring boot autoconfigure for setting the beans we need:

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-spring-boot-autoconfigure</artifactId>
  <version>1.0.0-SNAPSHOT</version>
</dependency>

And now the specific modules for Elasticsearch and an embedded model, for example OpenAI:

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-elasticsearch-store</artifactId>
  <version>1.0.0-SNAPSHOT</version>
</dependency>
<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-openai</artifactId>
  <version>1.0.0-SNAPSHOT</version>
</dependency>

Lastly, a PDF reader to ingest the game manual, also provided by Spring:

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-pdf-document-reader</artifactId>
  <version>1.0.0-SNAPSHOT</version>
</dependency>

The full POM can be found here.

Beans

All the Spring beans needed to run the application can be Autowired since, for this case, we don't need any specific configuration that would require creating the beans ourselves. The only thing we have to do is provide the necessary information to the application.properties file:

spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.chat.client.enabled=true

spring.elasticsearch.uris=${ES_SERVER_URL}
spring.elasticsearch.username=${ES_USERNAME}
spring.elasticsearch.password=${ES_PASSWORD}
spring.ai.vectorstore.elasticsearch.initialize-schema=true

If these properties are correctly set, the Spring framework will automatically choose the correct implementations of the vector store and embedding/chat model classes. If you're following this using a different LLM, be sure to configure the appropriate vector dimension using: spring.ai.vectorstore.elasticsearch.dimensions . For example, OpenAI's vector dimension is 1536, which is the default value, so we don't need to set the property.

Refer to the official Elasticsearch Vector Store documentation for more information on all the possible configuration parameters.

Service

First of all, create a new Service class where the vector store and chat client beans will be Autowired:

@Service
public class RagService {

    private ElasticsearchVectorStore vectorStore;
    private ChatClient chatClient;

    public RagService(ElasticsearchVectorStore vectorStore, ChatClient.Builder clientBuilder) {
        this.vectorStore = vectorStore;
        this.chatClient = clientBuilder.build();
    }
}

It will have two methods:

  • One for reading a PDF file from a given path, converting it into the SpringAI Document format and sending it to Elasticsearch.
  • The other one to query Elasticsearch for the documents relevant to the question and then providing those documents to the LLM for it to use to give an accurate response.

Content ingestion

Let's start with the first one:

public void ingestPDF(String path) {
    // Spring AI utility class to read a PDF file page by page
    PagePdfDocumentReader pdfReader = new PagePdfDocumentReader(path);
    List<Document> docbatch = pdfReader.read();

    // Sending batch of documents to vector store
    // applying tokenizer
    docbatch = new TokenTextSplitter().apply(docbatch);
    vectorStore.doAdd(docbatch);
}

Notice how the batch of documents went through a splitter process before being sent to the vector store: this is called "tokenization", meaning that the text gets divided into smaller tokens which the LLM can categorize and manage more efficiently. SpringAI provides the TokenTextSplitter that can be customized to tweak the size and desired number of chunks; in this case, the default configuration is enough, so our pages will be divided into 800-character long chunks.

This seems too simple, are we just sending strings to a database? As with anything Spring-related, there's a lot happening underneath, hidden by the high level of abstraction: the Documents are being sent to the Embedding Model to be embedded, or converted into a numerical representation of the content called vector. The documents with their corresponding embeddings get indexed into Elasticsearch vector database, optimized to handle this type of data when ingesting and querying.

Querying

The second method will implement the user's interaction with the Chat Client:

public String queryLLM(String question) {

    // Querying the vector store for documents related to the question
    List<Document> vectorStoreResult =
        vectorStore.doSimilaritySearch(SearchRequest.query(question).withTopK(5).withSimilarityThreshold(0.0));

    // Merging the documents into a single string
    String documents = vectorStoreResult.stream()
        .map(Document::getContent)
        .collect(Collectors.joining(System.lineSeparator()));

    // Setting the prompt with the context
    String prompt = """
        You're assisting with providing the rules of the tabletop game Runewars.
        Use the information from the DOCUMENTS section to provide accurate answers to the
        question in the QUESTION section.
        If unsure, simply state that you don't know.

        DOCUMENTS:
        """ + documents
        + """
        QUESTION:
        """ + question;


    // Calling the chat model with the question
    String response = chatClient.prompt()
        .user(prompt)
        .call()
        .content();

    return response +
        System.lineSeparator() +
        "Found at page: " +
        // Retrieving the first ranked page number from the document metadata
        vectorStoreResult.get(0).getMetadata().get(PagePdfDocumentReader.METADATA_START_PAGE_NUMBER) +
        " of the manual";
}

The question gets sent to the Elasticsearch vector store first, so that it can reply with the documents it deems more relevant to the query. How does it do that? As the called method says, by performing a similarity search, or in more detail, a KNN search: in a few words, the embeddings of the document will be compared to the question (which has also been embedded), and the ones that are considered to be closer will be returned.

In this case, we want to be precise with the answer, meaning we want no chance of hallucination, that's why the withSimilarityThreshold parameter has been set to 0. Also considering the nature of the data (a manual) we know that there won't be many repetitions, so we expect to find what we want in no more than 5 different pages, hence the withTopK parameter set to 5.

Controller

The easiest way to test a Spring service is to build a basic RestController that calls it:

@RestController
@RequestMapping("rag")
public class RagController {

    private final RagService ragService;

    @Autowired
    public RagController(RagService ragService) {
        this.ragService = ragService;
    }

    @PostMapping("/ingestPdf")
    public ResponseEntity ingestPDF(String path) {
        try {
            ragService.ingestPDF(path);
            return ResponseEntity.ok().body("Done!");
        } catch (Exception e) {
            System.out.println(e.getMessage());
            return ResponseEntity.internalServerError().build();
        }
    }

    @PostMapping("/query")
    public ResponseEntity query(String question) {
        try {
            String response = ragService.queryLLM(question);
            return ResponseEntity.ok().body(response);
        } catch (Exception e) {
            System.out.println(e.getMessage());
            return ResponseEntity.internalServerError().build();
        }
    }
}

Running Elasticsearch

Connecting to an instance of Elasticsearch Cloud is the fastest way to test your applications, but if you don't have access to it, no problem! You can get started with a local instance of Elasticsearch using start-local, a script that leverages Docker to quickly configure and run both the server and a Kibana instance.

curl -fsSL https://elastic.co/start-local | sh

Running the application

We're done with the code! Let's start the application on the familiar 8080 port and call it using curl (laziness is really the key theme here):

curl -XPOST "http://localhost:8080/rag/ingestPdf" --header "Content-Type: text/plain" --data "where-you-downloaded-the-pdf"

Remember that embedding is an expensive operation, and using a less powerful LLM means that this call could take a while to complete.

Finally, the question we asked at the start:

curl -XPOST "http://localhost:8080/rag/query" --header "Content-Type: text/plain" --data "where do you place the reward card after obtaining it?"
In Runewars, after a hero receives a Reward card, the controlling player draws the top card from the Reward deck, looks at it, and places it facedown under the Hero card of the hero who received it. 
The player does not flip the Reward card faceup until they wish to use its ability. Found at page 27 of the manual.

Wonderful isn't it? A chatbot well versed in the intricate rules of Runewars, ready to answer all of our questions.

Bonus: Ollama

We can easily use another language model by changing a few lines of the configuration code, thanks to SpringAI's abstraction. Let's replace OpenAI with a local instance of Ollama, starting from the POM dependency:

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-ollama</artifactId>
  <version>1.0.0-SNAPSHOT</version>
</dependency>

Then the properties in application.properties:

spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.init.pull-model-strategy=always
spring.ai.chat.client.enabled=true

spring.elasticsearch.uris=${ES_SERVER_URL}
spring.elasticsearch.username=${ES_USERNAME}
spring.elasticsearch.password=${ES_PASSWORD}
spring.ai.vectorstore.elasticsearch.initialize-schema=true
spring.ai.vectorstore.elasticsearch.dimensions=1024

The pull-model-strategy property will conveniently pull the default models for you, so make sure to disable it by setting it to never if you have everything already fully configured. Also remember to check the correct vector dimension, for example it's 1024 for mxbai-embed-large, the default Ollama embedding model.

That's it, everything else is unchanged! Of course changing the embedding model means that the Elasticsearch index must be changed as well, as the old embeddings will be incompatible with the new ones.

Conclusion

Following these steps, you should be able to set up a fully functioning RAG application with the same complexity as a basic CRUD Spring-based application. The complete code can be found here. For any questions or issues, reach out to us on our Discuss page.

Ready to try this out on your own? Start a free trial.

Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our advanced semantic search webinar to build your next GenAI app!

Related content

High Quality RAG with Aryn DocPrep, DocParse and Elasticsearch vector database

January 21, 2025

High Quality RAG with Aryn DocPrep, DocParse and Elasticsearch vector database

Learn how to achieve high-quality RAG with effective data preparation using Aryn.ai DocParse, DocPrep, and Elasticsearch vector database.

Improving e-commerce search with query profiles in Elastic

Improving e-commerce search with query profiles in Elastic

Query profiles tackle semantic search challenges in e-commerce. This blog demonstrates how to enhance e-commerce search using query profiles in Elastic.

Early termination in HNSW for faster approximate KNN search

January 7, 2025

Early termination in HNSW for faster approximate KNN search

Learn how HNSW can be made faster for KNN search, using smart early termination strategies.

Optimized Scalar Quantization: Even Better Binary Quantization

January 6, 2025

Optimized Scalar Quantization: Even Better Binary Quantization

Here we explain optimized scalar quantization in Elasticsearch and how we used it to improve Better Binary Quantization (BBQ).

When hybrid search truly shines

When hybrid search truly shines

Demonstrating when hybrid search is better than lexical or semantic search on their own.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself