Elasticsearch 8.16 has a new functionality that allows you to upload PDF files directly into Kibana and analyze them using Playground. In this article, we'll see how to use this functionality by uploading a resume in PDF format and then using Playground to interact with it.
Playground is a low-code platform hosted in Kibana, that allows you to create a RAG application and chat with your content. You can read more about it in this article and even test it using this link.
Steps:
- Configure the Elasticsearch Inference Service Endpoint
- Upload PDFs to Kibana
- Interact with the data in Playground
Configure the Elasticsearch Inference Service Endpoint
To run semantic searches, we must first configure an inference endpoint. In this example, we'll use the Elasticsearch Inference Endpoint. This endpoint offers:
- rerank
- sparse embedding
- text embedding
For this example, let's select sparse embedding:
PUT _inference/sparse_embedding/my-elser-model
{
"service": "elasticsearch",
"service_settings": {
"adaptive_allocations": {
"enabled": true,
"min_number_of_allocations": 1,
"max_number_of_allocations": 10
},
"num_threads": 1,
"model_id": ".elser_model_2"
}
}
Once configured, confirm that the model was correctly loaded into Kibana by checking Search > Relevance > Inference Endpoint in the Kibana UI.
Upload PDFs to Kibana
We'll upload the resume of a junior developer to learn how to use the Kibana upload files functionality.
Go to the Kibana UI and follow these steps:
Next, for Import Data, we have two options:
Simple: This is the default option and it allows us to quickly upload our PDF into the index and automatically creates a data view with the indexed info.
Advanced: This option allows us to customize mappings or add ingest pipelines. Within these settings you can:
- Add a semantic text type of field.
- Index Settings: If you want to configure things like shards or analyzers.
- Index Mappings: If you want to change a field type or how you define your data.
- Ingest Pipeline: If you want to make changes to your data before indexing it.
Go to "Advanced" and select "Add additional field":
Select the field attachment.content
; in “copy to field” type "content" and make sure that the inference endpoint is my-elser-model:
The field Copy to is used to copy the content from attachment.content
to a new semantic_text
field of (content), which automatically generates vector embeddings using the underlying Inference endpoint (Elastic’s ELSER in this case). This makes both the semantic and text fields available so you can run full-text, semantic, or hybrid searches.
Once everything is configured, click on "Import":
Now that the index is created, we can explore it using Playground.
Interact with the data in Playground
Connect to Playground
After configuring the index and uploading the resumes, we now need to connect the index to Playground. Click Connect to an LLM and select one of the options.
Configure the chatbot
Once Playground has been configured and we have indexed Alex Johnson's resume, we can interact with the data. Using semantic search and LLMs we can ask questions using natural language and get answers even if the documents don't have the keywords we used in the query, like in the example below:
Using the instructions menu, we can control the chatbot behavior and define features like the response format. It can also include citations, to make sure the answer is properly grounded.
If we go to the "Query" tab, we can see the query generated by Playground and we add both a text
and a semantic_text
fields, Playground will automatically generate a hybrid query to normalize the score between different types of different types of queries.
Playground not only answers questions but also helps us understand the internal components of a RAG system, like querying, retrieval phase, context and prompt instructions.
Give it a try!
With the Elasticsearch 8.16 update, we can easily upload PDF/Word/Powerpoint files using the Kibana UI. It can automatically create an index in the simple mode, and you can use the advanced mode to customize your index and tailor it to your needs.
Once your files are uploaded, you can access Playground and quickly and easily chat with them since Playground will handle the LLM interactions and provide the best query based on the type of fields you want to search.
Ready to try this out on your own? Start a free trial.
Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our advanced semantic search webinar to build your next GenAI app!
Related content
January 20, 2025
Navigating graphs for Retrieval-Augmented Generation using Elasticsearch
Discover how to use Knowledge Graphs to enhance RAG results while storing the graph efficiently in Elasticsearch. This guide explores a detailed strategy for dynamically generating knowledge subgraphs tailored to a user’s query.
January 17, 2025
How to ingest data to Elasticsearch through Apache Airflow
Learn how to ingest data to Elasticsearch through Apache Airflow.
January 16, 2025
Jira connector tutorial part II: 6 optimization tips
After connecting Jira to Elasticsearch, we'll now review best practices to escalate this deployment.
January 15, 2025
Jira connector tutorial part I
Indexing our Jira content into Elasticsearch to create a unified data source and do search with Document Level Security.
January 21, 2025
High Quality RAG with Aryn DocPrep, DocParse and Elasticsearch vector database
Learn how to achieve high-quality RAG with effective data preparation using Aryn.ai DocParse, DocPrep, and Elasticsearch vector database.