Build a powerful RAG workflow using LangGraph and Elasticsearch

The LangGraph Retrieval Agent Template is a starter project developed by LangChain to facilitate the creation of retrieval-based question-answering systems using LangGraph in LangGraph Studio. This template is pre-configured to integrate seamlessly with Elasticsearch, enabling developers to rapidly build agents that can index and retrieve documents efficiently.

This blog focuses on running and customizing the LangChain Retrieval Agent Template using LangGraph Studio and LangGraph CLI. The template provides a framework for building retrieval-augmented generation (RAG) applications, leveraging various retrieval backends like Elasticsearch.

We will walk you through setting up, configuring the environment, and executing the template efficiently with Elastic while customizing the agent flow.

Prerequisites

Before proceeding, ensure you have the following installed:

Elasticsearch Cloud deployment or on-prem Elasticsearch deployment (or create a 14-day Free Trial on Elastic Cloud) - Version 8.0.0 or higher
Python 3.9+
Access to an LLM provider such as Cohere (used in this guide), OpenAI, or Anthropic/Claude

Creating the LangGraph app

1. Install the LangGraph CLI

pip install --upgrade "langgraph-cli[inmem]"

2. Create LangGraph app from retrieval-agent-template

mkdir lg-agent-demo
cd lg-agent-demo
langgraph new lg-agent-demo

You will be presented with an interactive menu that will allow you to choose from a list of available templates. Select 4 for Retrieval Agent and 1 for Python, as shown below:

Troubleshooting: If you encounter the error “urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)> “

Please run the Install Certificate Command of Python to resolve the issue, as shown below.

3. Install dependencies

In the root of your new LangGraph app, create a virtual environment and install the dependencies in edit mode so your local changes are used by the server:

#For Mac
python3 -m venv lg-demo
source lg-demo/bin/activate 
pip install -e .

#For Windows
python3 -m venv lg-demo
lg-demo\Scripts\activate 
pip install -e .

Setting up the environment

1. Create a `.env` file

The .env file holds API keys and configurations so the app can connect to your chosen LLM and retrieval provider. Generate a new .env file by duplicating the example configuration:

cp .env.example .env

2. Configure the .`env` file

The .env file comes with a set of default configurations. You can update it by adding the necessary API keys and values based on your setup. Any keys that aren't relevant to your use case can be left unchanged or removed.

# To separate your traces from other applications
LANGSMITH_PROJECT=retrieval-agent

# LLM choice (set the API key for your selected provider):
ANTHROPIC_API_KEY=your_anthropic_api_key
FIREWORKS_API_KEY=your_fireworks_api_key
OPENAI_API_KEY=your_openai_api_key

# Retrieval provider (configure based on your chosen service):

## Elastic Cloud:
ELASTICSEARCH_URL=https://your_elastic_cloud_url
ELASTICSEARCH_API_KEY=your_elastic_api_key

## Elastic Local:
ELASTICSEARCH_URL=http://host.docker.internal:9200
ELASTICSEARCH_USER=elastic
ELASTICSEARCH_PASSWORD=changeme

## Pinecone:
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX_NAME=your_pinecone_index_name

## MongoDB Atlas:
MONGODB_URI=your_mongodb_connection_string

# Cohere API key:
COHERE_API_KEY=your_cohere_api_key

Example .env file (using Elastic Cloud and Cohere)

Below is a sample .env configuration for using Elastic Cloud as the retrieval provider and Cohere as the LLM, as demonstrated in this blog:

# To separate your traces from other applications
LANGSMITH_PROJECT=retrieval-agent
#Retrieval Provider
# Elasticsearch configuration
ELASTICSEARCH_URL=elastic-url:443
ELASTICSEARCH_API_KEY=elastic_api_key
# Cohere API key
COHERE_API_KEY=cohere_api_key

Note: While this guide uses Cohere for both response generation and embeddings, you’re free to use other LLM providers such as OpenAI, Claude, or even a local LLM model depending on your use case. Make sure that each key you intend to use is present and correctly set in the .env file.

3. Update configuration file - `configuration.py`

After setting up your .env file with the appropriate API keys, the next step is to update your application’s default model configuration. Updating the configuration ensures the system uses the services and models you’ve specified in your .env file.

Navigate to the configuration file:

 cd src/retrieval_graph

The configuration.py file contains the default model settings used by the retrieval agent for three main tasks:

Embedding model – converts documents into vector representations
Query model – processes the user’s query into a vector
Response model – generates the final response

By default, the code uses models from OpenAI (e.g., openai/text-embedding-3-small) and Anthropic (e.g., anthropic/claude-3-5-sonnet-20240620 and anthropic/claude-3-haiku-20240307).

In this blog, we're switching to using Cohere models. If you're already using OpenAI or Anthropic, no changes are needed.

Example changes (using Cohere):

Open configuration.py and modify the model defaults as shown below:

…
 embedding_model: Annotated[
       str,
       {"__template_metadata__": {"kind": "embeddings"}},
   ] = field(
       default="cohere/embed-english-v3.0",
…
response_model: Annotated[str, {"__template_metadata__": {"kind": "llm"}}] = field(
       default="cohere/command-r-08-2024",
…
query_model: Annotated[str, {"__template_metadata__": {"kind": "llm"}}] = field(
       default="cohere/command-r-08-2024",
       metadata={

Running the Retrieval Agent with LangGraph CLI

1. Launch LangGraph server

cd lg-agent-demo
langgraph dev

This will start up the LangGraph API server locally. If this runs successfully, you should see something like:

Open Studio UI URL.

There are two graphs available:

Retrieval Graph: Retrieves data from Elasticsearch and responds to Query using an LLM.
Indexer Graph: Indexes documents into Elasticsearch and generates embeddings using an LLM.

2. Configuring the Indexer Graph

Open the Indexer Graph.
Click Manage Assistants.
- Click on 'Add New Assistant', enter the user details as specified, and then close the window.

{"user_id": "101"}

3. Indexing sample documents

Index the following sample documents, which represent a hypothetical quarterly report for the organization NoveTech:

[
  {    "page_content": "NoveTech Solutions Q1 2025 Report - Revenue: $120.5M, Net Profit: $18.2M, EPS: $2.15. Strong AI software launch and $50M government contract secured."
  },
  {
    "page_content": "NoveTech Solutions Business Highlights - AI-driven analytics software gained 15% market share. Expansion into Southeast Asia with two new offices. Cloud security contract secured."
  },
  {
    "page_content": "NoveTech Solutions Financial Overview - Operating expenses at $85.3M, Gross Margin 29.3%. Stock price rose from $72.5 to $78.3. Market Cap reached $5.2B."
  },
  {
    "page_content": "NoveTech Solutions Challenges - Rising supply chain costs impacting hardware production. Regulatory delays slowing European expansion. Competitive pressure in cybersecurity sector."
  },
  {
    "page_content": "NoveTech Solutions Future Outlook - Expected revenue for Q2 2025: $135M. New AI chatbot and blockchain security platform launch planned. Expansion into Latin America."
  },
  {
    "page_content": "NoveTech Solutions Market Performance - Year-over-Year growth at 12.7%. Stock price increase reflects investor confidence. Cybersecurity and AI sectors remain competitive."
  },
  {
    "page_content": "NoveTech Solutions Strategic Moves - Investing in R&D to enhance AI-driven automation. Strengthening partnerships with enterprise cloud providers. Focusing on data privacy solutions."
  },
  {
    "page_content": "NoveTech Solutions CEO Statement - 'NoveTech Solutions continues to innovate in AI and cybersecurity. Our growth strategy remains strong, and we foresee steady expansion in the coming quarters.'"
  }
]

Once the documents are indexed, you will see a delete message in the thread, as shown below.

4. Running the Retrieval Graph

Switch to the Retrieval Graph.
Enter the following search query:

What was NovaTech Solutions total revenue in Q1 2025?

The system will return relevant documents and provide an exact answer based on the indexed data.

Customize the Retrieval Agent

To enhance the user experience, we introduce a customization step in the Retrieval Graph to predict the next three questions a user might ask. This prediction is based on:

Context from the retrieved documents
Previous user interactions
Last user query

The following code changes are required to implement Query Prediction feature:

1. Update `graph.py`

Add predict_query function:

async def predict_query(
   state: State, *, config: RunnableConfig
) -> dict[str, list[BaseMessage]]:
   logger.info(f"predict_query predict_querypredict_query predict_query predict_query predict_query")  # Log the query

   configuration = Configuration.from_runnable_config(config)
   prompt = ChatPromptTemplate.from_messages(
       [
           ("system", configuration.predict_next_question_prompt),
           ("placeholder", "{messages}"),
       ]
   )
   model = load_chat_model(configuration.response_model)
   user_query = state.queries[-1] if state.queries else "No prior query available"
   logger.info(f"user_query: {user_query}")
   logger.info(f"statemessage: {state.messages}")
   #human_messages = [msg for msg in state.message if isinstance(msg, HumanMessage)]

   message_value = await prompt.ainvoke(
       {
           "messages": state.messages,
           "user_query": user_query,  # Use the most recent query as primary input
           "system_time": datetime.now(tz=timezone.utc).isoformat(),
       },
       config,
   )

   next_question = await model.ainvoke(message_value, config)
   return {"next_question": [next_question]}

Modify respond function to return response Object , instead of message:

async def respond(
   state: State, *, config: RunnableConfig
) -> dict[str, list[BaseMessage]]:
   """Call the LLM powering our "agent"."""
   configuration = Configuration.from_runnable_config(config)
   # Feel free to customize the prompt, model, and other logic!
   prompt = ChatPromptTemplate.from_messages(
       [
           ("system", configuration.response_system_prompt),
           ("placeholder", "{messages}"),
       ]
   )
   model = load_chat_model(configuration.response_model)

   retrieved_docs = format_docs(state.retrieved_docs)
   message_value = await prompt.ainvoke(
       {
           "messages": state.messages,
           "retrieved_docs": retrieved_docs,
           "system_time": datetime.now(tz=timezone.utc).isoformat(),
       },
       config,
   )
   response = await model.ainvoke(message_value, config)
   # We return a list, because this will get added to the existing list
   return {"response": [response]}

Update graph structure to add new node and edge for predict_query:

builder.add_node(generate_query)
builder.add_node(retrieve)
builder.add_node(respond)
builder.add_node(predict_query)
builder.add_edge("__start__", "generate_query")
builder.add_edge("generate_query", "retrieve")
builder.add_edge("retrieve", "respond")
builder.add_edge("respond", "predict_query")

2. Update `prompts.py`

Craft prompt for Query Prediction in prompts.py:

PREDICT_NEXT_QUESTION_PROMPT = """Given the user query and the retrieved documents, suggest the most likely next question the user might ask.

**Context:**
- Previous Queries:
{previous_queries}

- Latest User Query: {user_query}

- Retrieved Documents:
{retrieved_docs}

**Guidelines:**
1. Do not suggest a question that has already been asked in previous queries.
2. Consider the retrieved documents when predicting the next logical question.
3. If the user's query is already fully answered, suggest a relevant follow-up question.
4. Keep the suggested question natural and conversational.
5. Suggest at least 3 question

System time: {system_time}"""

3. Update `configuration.py`

Add predict_next_question_prompt:

predict_next_question_prompt: str = field(
       default=prompts.PREDICT_NEXT_QUESTION_PROMPT,
       metadata={"description": "The system prompt used for generating responses."},
   )

4. Update `state.py`

Add the following attributes:

response: Annotated[Sequence[AnyMessage], add_messages]
next_question : Annotated[Sequence[AnyMessage], add_messages]

5. Re-run the Retrieval Graph

Enter the following search query again:

What was NovaTech Solutions total revenue in Q1 2025?

The system will process the input and predict three related questions that users might ask, as shown below.

Conclusion

Integrating the Retrieval Agent template within LangGraph Studio and CLI provides several key benefits:

Accelerated development: The template and visualization tools streamline the creation and debugging of retrieval workflows, reducing development time.
Seamless deployment: Built-in support for APIs and auto-scaling ensures smooth deployment across environments.
Easy updates: Modifying workflows, adding new functionalities, and integrating additional nodes is simple, making it easier to scale and enhance the retrieval process.
Persistent memory: The system retains agent states and knowledge, improving consistency and reliability.
Flexible workflow modeling: Developers can customize retrieval logic and communication rules for specific use cases.
Real-time interaction and debugging: The ability to interact with running agents allows for efficient testing and issue resolution.

By leveraging these features, organizations can build powerful, efficient, and scalable retrieval systems that enhance data accessibility and user experience.

The full source code for this project is available on GitHub.

Ready to try this out on your own? Start a free trial.

Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our Beyond RAG Basics webinar to build your next GenAI app!

Build a powerful RAG workflow using LangGraph and Elasticsearch

Prerequisites

Creating the LangGraph app

1. Install the LangGraph CLI

2. Create LangGraph app from retrieval-agent-template

3. Install dependencies

Setting up the environment

1. Create a `.env` file

2. Configure the .`env` file

3. Update configuration file - `configuration.py`

Example changes (using Cohere):

Running the Retrieval Agent with LangGraph CLI

1. Launch LangGraph server

2. Configuring the Indexer Graph

3. Indexing sample documents

4. Running the Retrieval Graph

Customize the Retrieval Agent

1. Update `graph.py`

2. Update `prompts.py`

3. Update `configuration.py`

4. Update `state.py`

5. Re-run the Retrieval Graph

Conclusion

Related content

Elasticsearch in Javascript the proper way, part II

Elasticsearch in Javascript the proper way, part I

Displaying fields in an Elasticsearch index

Deleting a field from a document in Elasticsearch

Elasticsearch shards and replicas: Getting started guide

Ready to build state of the art search experiences?

​​Build a powerful RAG workflow using LangGraph and Elasticsearch

Prerequisites

Creating the LangGraph app

1. Install the LangGraph CLI

2. Create LangGraph app from retrieval-agent-template

3. Install dependencies

Setting up the environment

1. Create a .env file

2. Configure the .env file

3. Update configuration file - configuration.py

Example changes (using Cohere):

Running the Retrieval Agent with LangGraph CLI

1. Launch LangGraph server

2. Configuring the Indexer Graph

3. Indexing sample documents

4. Running the Retrieval Graph

Customize the Retrieval Agent

1. Update graph.py

2. Update prompts.py

3. Update configuration.py

4. Update state.py

5. Re-run the Retrieval Graph

Conclusion

Related content

Elasticsearch in Javascript the proper way, part II

Elasticsearch in Javascript the proper way, part I

Displaying fields in an Elasticsearch index

Deleting a field from a document in Elasticsearch

Elasticsearch shards and replicas: Getting started guide

Ready to build state of the art search experiences?

Build a powerful RAG workflow using LangGraph and Elasticsearch

1. Create a `.env` file

2. Configure the .`env` file

3. Update configuration file - `configuration.py`

1. Update `graph.py`

2. Update `prompts.py`

3. Update `configuration.py`

4. Update `state.py`