The LangGraph Retrieval Agent Template is a starter project developed by LangChain to facilitate the creation of retrieval-based question-answering systems using LangGraph in LangGraph Studio. This template is pre-configured to integrate seamlessly with Elasticsearch, enabling developers to rapidly build agents that can index and retrieve documents efficiently.
This blog focuses on running and customizing the LangChain Retrieval Agent Template using LangGraph Studio and LangGraph CLI. The template provides a framework for building retrieval-augmented generation (RAG) applications, leveraging various retrieval backends like Elasticsearch.
We will walk you through setting up, configuring the environment, and executing the template efficiently with Elastic while customizing the agent flow.
Prerequisites
Before proceeding, ensure you have the following installed:
- Elasticsearch Cloud deployment or on-prem Elasticsearch deployment (or create a 14-day Free Trial on Elastic Cloud) - Version 8.0.0 or higher
- Python 3.9+
- Access to an LLM provider such as Cohere (used in this guide), OpenAI, or Anthropic/Claude
Creating the LangGraph app
1. Install the LangGraph CLI
pip install --upgrade "langgraph-cli[inmem]"
2. Create LangGraph app from retrieval-agent-template
mkdir lg-agent-demo
cd lg-agent-demo
langgraph new lg-agent-demo
You will be presented with an interactive menu that will allow you to choose from a list of available templates. Select 4 for Retrieval Agent and 1 for Python, as shown below:

- Troubleshooting: If you encounter the error “urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)> “
Please run the Install Certificate Command of Python to resolve the issue, as shown below.

3. Install dependencies
In the root of your new LangGraph app, create a virtual environment and install the dependencies in edit
mode so your local changes are used by the server:
#For Mac
python3 -m venv lg-demo
source lg-demo/bin/activate
pip install -e .
#For Windows
python3 -m venv lg-demo
lg-demo\Scripts\activate
pip install -e .
Setting up the environment
1. Create a .env
file
The .env
file holds API keys and configurations so the app can connect to your chosen LLM and retrieval provider. Generate a new .env
file by duplicating the example configuration:
cp .env.example .env
2. Configure the .env
file
The .env
file comes with a set of default configurations. You can update it by adding the necessary API keys and values based on your setup. Any keys that aren't relevant to your use case can be left unchanged or removed.
# To separate your traces from other applications
LANGSMITH_PROJECT=retrieval-agent
# LLM choice (set the API key for your selected provider):
ANTHROPIC_API_KEY=your_anthropic_api_key
FIREWORKS_API_KEY=your_fireworks_api_key
OPENAI_API_KEY=your_openai_api_key
# Retrieval provider (configure based on your chosen service):
## Elastic Cloud:
ELASTICSEARCH_URL=https://your_elastic_cloud_url
ELASTICSEARCH_API_KEY=your_elastic_api_key
## Elastic Local:
ELASTICSEARCH_URL=http://host.docker.internal:9200
ELASTICSEARCH_USER=elastic
ELASTICSEARCH_PASSWORD=changeme
## Pinecone:
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX_NAME=your_pinecone_index_name
## MongoDB Atlas:
MONGODB_URI=your_mongodb_connection_string
# Cohere API key:
COHERE_API_KEY=your_cohere_api_key
- Example
.env
file (using Elastic Cloud and Cohere)
Below is a sample .env
configuration for using Elastic Cloud as the retrieval provider and Cohere as the LLM, as demonstrated in this blog:
# To separate your traces from other applications
LANGSMITH_PROJECT=retrieval-agent
#Retrieval Provider
# Elasticsearch configuration
ELASTICSEARCH_URL=elastic-url:443
ELASTICSEARCH_API_KEY=elastic_api_key
# Cohere API key
COHERE_API_KEY=cohere_api_key
Note: While this guide uses Cohere for both response generation and embeddings, you’re free to use other LLM providers such as OpenAI, Claude, or even a local LLM model depending on your use case. Make sure that each key you intend to use is present and correctly set in the .env
file.
3. Update configuration file - configuration.py
After setting up your .env
file with the appropriate API keys, the next step is to update your application’s default model configuration. Updating the configuration ensures the system uses the services and models you’ve specified in your .env
file.
Navigate to the configuration file:
cd src/retrieval_graph
The configuration.py
file contains the default model settings used by the retrieval agent for three main tasks:
- Embedding model – converts documents into vector representations
- Query model – processes the user’s query into a vector
- Response model – generates the final response
By default, the code uses models from OpenAI (e.g., openai/text-embedding-3-small
) and Anthropic (e.g., anthropic/claude-3-5-sonnet-20240620 and anthropic/claude-3-haiku-20240307
).
In this blog, we're switching to using Cohere models. If you're already using OpenAI or Anthropic, no changes are needed.
Example changes (using Cohere):
Open configuration.py
and modify the model defaults as shown below:
…
embedding_model: Annotated[
str,
{"__template_metadata__": {"kind": "embeddings"}},
] = field(
default="cohere/embed-english-v3.0",
…
response_model: Annotated[str, {"__template_metadata__": {"kind": "llm"}}] = field(
default="cohere/command-r-08-2024",
…
query_model: Annotated[str, {"__template_metadata__": {"kind": "llm"}}] = field(
default="cohere/command-r-08-2024",
metadata={
Running the Retrieval Agent with LangGraph CLI
1. Launch LangGraph server
cd lg-agent-demo
langgraph dev
This will start up the LangGraph API server locally. If this runs successfully, you should see something like:

Open Studio UI URL.
There are two graphs available:
- Retrieval Graph: Retrieves data from Elasticsearch and responds to Query using an LLM.
- Indexer Graph: Indexes documents into Elasticsearch and generates embeddings using an LLM.


2. Configuring the Indexer Graph
- Open the Indexer Graph.
- Click Manage Assistants.
- Click on 'Add New Assistant', enter the user details as specified, and then close the window.
{"user_id": "101"}


3. Indexing sample documents
- Index the following sample documents, which represent a hypothetical quarterly report for the organization NoveTech:
[
{ "page_content": "NoveTech Solutions Q1 2025 Report - Revenue: $120.5M, Net Profit: $18.2M, EPS: $2.15. Strong AI software launch and $50M government contract secured."
},
{
"page_content": "NoveTech Solutions Business Highlights - AI-driven analytics software gained 15% market share. Expansion into Southeast Asia with two new offices. Cloud security contract secured."
},
{
"page_content": "NoveTech Solutions Financial Overview - Operating expenses at $85.3M, Gross Margin 29.3%. Stock price rose from $72.5 to $78.3. Market Cap reached $5.2B."
},
{
"page_content": "NoveTech Solutions Challenges - Rising supply chain costs impacting hardware production. Regulatory delays slowing European expansion. Competitive pressure in cybersecurity sector."
},
{
"page_content": "NoveTech Solutions Future Outlook - Expected revenue for Q2 2025: $135M. New AI chatbot and blockchain security platform launch planned. Expansion into Latin America."
},
{
"page_content": "NoveTech Solutions Market Performance - Year-over-Year growth at 12.7%. Stock price increase reflects investor confidence. Cybersecurity and AI sectors remain competitive."
},
{
"page_content": "NoveTech Solutions Strategic Moves - Investing in R&D to enhance AI-driven automation. Strengthening partnerships with enterprise cloud providers. Focusing on data privacy solutions."
},
{
"page_content": "NoveTech Solutions CEO Statement - 'NoveTech Solutions continues to innovate in AI and cybersecurity. Our growth strategy remains strong, and we foresee steady expansion in the coming quarters.'"
}
]
Once the documents are indexed, you will see a delete message in the thread, as shown below.

4. Running the Retrieval Graph
- Switch to the Retrieval Graph.
- Enter the following search query:
What was NovaTech Solutions total revenue in Q1 2025?

The system will return relevant documents and provide an exact answer based on the indexed data.
Customize the Retrieval Agent
To enhance the user experience, we introduce a customization step in the Retrieval Graph to predict the next three questions a user might ask. This prediction is based on:
- Context from the retrieved documents
- Previous user interactions
- Last user query
The following code changes are required to implement Query Prediction feature:
1. Update graph.py
- Add
predict_query
function:
async def predict_query(
state: State, *, config: RunnableConfig
) -> dict[str, list[BaseMessage]]:
logger.info(f"predict_query predict_querypredict_query predict_query predict_query predict_query") # Log the query
configuration = Configuration.from_runnable_config(config)
prompt = ChatPromptTemplate.from_messages(
[
("system", configuration.predict_next_question_prompt),
("placeholder", "{messages}"),
]
)
model = load_chat_model(configuration.response_model)
user_query = state.queries[-1] if state.queries else "No prior query available"
logger.info(f"user_query: {user_query}")
logger.info(f"statemessage: {state.messages}")
#human_messages = [msg for msg in state.message if isinstance(msg, HumanMessage)]
message_value = await prompt.ainvoke(
{
"messages": state.messages,
"user_query": user_query, # Use the most recent query as primary input
"system_time": datetime.now(tz=timezone.utc).isoformat(),
},
config,
)
next_question = await model.ainvoke(message_value, config)
return {"next_question": [next_question]}
- Modify
respond
function to returnresponse
Object , instead of message:
async def respond(
state: State, *, config: RunnableConfig
) -> dict[str, list[BaseMessage]]:
"""Call the LLM powering our "agent"."""
configuration = Configuration.from_runnable_config(config)
# Feel free to customize the prompt, model, and other logic!
prompt = ChatPromptTemplate.from_messages(
[
("system", configuration.response_system_prompt),
("placeholder", "{messages}"),
]
)
model = load_chat_model(configuration.response_model)
retrieved_docs = format_docs(state.retrieved_docs)
message_value = await prompt.ainvoke(
{
"messages": state.messages,
"retrieved_docs": retrieved_docs,
"system_time": datetime.now(tz=timezone.utc).isoformat(),
},
config,
)
response = await model.ainvoke(message_value, config)
# We return a list, because this will get added to the existing list
return {"response": [response]}
- Update graph structure to add new node and edge for predict_query:
builder.add_node(generate_query)
builder.add_node(retrieve)
builder.add_node(respond)
builder.add_node(predict_query)
builder.add_edge("__start__", "generate_query")
builder.add_edge("generate_query", "retrieve")
builder.add_edge("retrieve", "respond")
builder.add_edge("respond", "predict_query")
2. Update prompts.py
- Craft prompt for Query Prediction in
prompts.py
:
PREDICT_NEXT_QUESTION_PROMPT = """Given the user query and the retrieved documents, suggest the most likely next question the user might ask.
**Context:**
- Previous Queries:
{previous_queries}
- Latest User Query: {user_query}
- Retrieved Documents:
{retrieved_docs}
**Guidelines:**
1. Do not suggest a question that has already been asked in previous queries.
2. Consider the retrieved documents when predicting the next logical question.
3. If the user's query is already fully answered, suggest a relevant follow-up question.
4. Keep the suggested question natural and conversational.
5. Suggest at least 3 question
System time: {system_time}"""
3. Update configuration.py
- Add
predict_next_question_prompt
:
predict_next_question_prompt: str = field(
default=prompts.PREDICT_NEXT_QUESTION_PROMPT,
metadata={"description": "The system prompt used for generating responses."},
)
4. Update state.py
- Add the following attributes:
response: Annotated[Sequence[AnyMessage], add_messages]
next_question : Annotated[Sequence[AnyMessage], add_messages]
5. Re-run the Retrieval Graph
- Enter the following search query again:
What was NovaTech Solutions total revenue in Q1 2025?
The system will process the input and predict three related questions that users might ask, as shown below.

Conclusion
Integrating the Retrieval Agent template within LangGraph Studio and CLI provides several key benefits:
- Accelerated development: The template and visualization tools streamline the creation and debugging of retrieval workflows, reducing development time.
- Seamless deployment: Built-in support for APIs and auto-scaling ensures smooth deployment across environments.
- Easy updates: Modifying workflows, adding new functionalities, and integrating additional nodes is simple, making it easier to scale and enhance the retrieval process.
- Persistent memory: The system retains agent states and knowledge, improving consistency and reliability.
- Flexible workflow modeling: Developers can customize retrieval logic and communication rules for specific use cases.
- Real-time interaction and debugging: The ability to interact with running agents allows for efficient testing and issue resolution.
By leveraging these features, organizations can build powerful, efficient, and scalable retrieval systems that enhance data accessibility and user experience.
The full source code for this project is available on GitHub.
Ready to try this out on your own? Start a free trial.
Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our Beyond RAG Basics webinar to build your next GenAI app!
Related content

May 20, 2025
Elasticsearch in Javascript the proper way, part II
Reviewing production best practices and explaining how to run the Elasticsearch Node.js client in Serverless environments.

May 15, 2025
Elasticsearch in Javascript the proper way, part I
Explaining how to create a production-ready Elasticsearch backend in JavaScript.

May 26, 2025
Displaying fields in an Elasticsearch index
Exploring techniques for displaying fields in an Elasticsearch index.

May 9, 2025
Deleting a field from a document in Elasticsearch
Exploring methods for deleting a field from a document in Elasticsearch.

May 21, 2025
Elasticsearch shards and replicas: Getting started guide
Master the concepts of shards and replicas in Elasticsearch and learn how to optimize them.