Elasticsearch Open Inference API adds support for Jina AI Embeddings and Rerank Model

Our friends at Jina AI added native integration for Jina AI’s embedding models and reranking products to the Elasticsearch open Inference API. This includes support for industry-leading multilingual text embeddings and multilingual reranking—optimized for retrieval, clustering, and classification. This integration provides developers with a high-performance, cost-effective tool kit for AI information retrieval and semantic applications with Elasticsearch vector database and Jina AI.

With asymmetric embeddings for search and high-performance reranking models to enhance precision, Jina AI’s models put top-shelf AI in Elasticsearch applications without additional integration or development costs.

This post explores how to access Jina AI models using the Elasticsearch open Inference API.

About Jina AI models

Founded in 2020, Jina AI is a leading search foundation company creating embeddings, rerankers, and small language models to help developers build reliable and high-quality multimodal search applications.

Jina Embeddings v3 is a multilingual embedding model from Jina AI that supports 8K tokens input length. Jina CLIP v2 is a multimodal and multilingual embedding model, supporting texts with 8K tokens and image inputs. Jina Reranker v2 is a neural reranker model, which is multilingual and post-trained, especially for agentic use cases. ReaderLM-v2 is a small language model that converts input data from various sources to Markdown or structured data formats suitable for interacting with LLMs.

Accessing Jina AI models using the Elasticsearch open Inference API

We will be using the Kibana Dev Console to go through the setup. Alternatively, here is a Jupyter notebook to get you started.

First, you'll need a Jina AI API key. You can get a free key with a one-million token usage limit here.

Jina AI makes several models available, but we recommend using the latest embedding model, jina-embeddings-v3, and their reranking model jina-reranker-v2-base-multilingual.

Step 1: Creating Jina AI inference API endpoint for generating embeddings

Create your text embedding inference endpoint in Elasticsearch by providing the service as jinaai. Use your Jina AI API key for api_key and model_id as jina-embeddings-v3 in service settings.

PUT _inference/text_embedding/jina_embeddings 
{
    "service": "jinaai",
    "service_settings": {
        "api_key": "<api-key>", 
        "model_id": "jina-embeddings-v3"
    }
}

Let’s test our Jina AI endpoint to validate the configurations. To do this, let’s perform the inference on a sample text.

POST _inference/text_embedding/jina_embeddings 
{
    "input": "Jina AI models are now supported natively in Elasticsearch."
}

Step 2: Creating Jina AI inference API endpoint for reranking

Similarly, create a rerank task_type service named jina_rerank for use during the search. Use jinaai as the service name, your Jina AI API key for api_key, and model_id as jina-reranker-v2-base-multilingual in service settings.

The task_settings section of the API sets the maximum number of documents for jina_rerank to return with the top_n setting, set here to 10. The return_documents setting informs jina_rerank that it should return a full copy of the search candidate documents it identifies.

PUT _inference/rerank/jina_rerank
{
    "service": "jinaai",
    "service_settings": {
        "api_key": "<api-key>",
        "model_id": "jina-reranker-v2-base-multilingual"
    },
    "task_settings": {
        "top_n": 10,
        "return_documents": true
    }
}

In the Kibana dev console, these commands should return a 200 response code indicating that the services are correctly configured.

Step 3: Generating embeddings (automagically)

Let’s create an index configured to use the jina_embeddings to generate the embeddings. We will create an index named film_index and generate and store embeddings automatically with the semantic_text type using jina_embeddings as the value for inference_id.

PUT film_index
{
  "mappings": {
    "properties": {
      "blurb": {
        "type": "semantic_text",
        "inference_id": "jina_embeddings"
      }
    }
  }
}

Now, we can bulk-insert documents into the index. We are using the films dataset below for this tutorial, which contains information about six films. Each document is a JSON string with a field labeled blurb.

PUT film_index/_bulk?pretty
{ "index" : { "_index" : "film_index" } }
{"title": "Casablanca", "director": "Michael Curtiz", "year": 1942, "runtime_min": 102, "genre": ["Drama", "Romance"], "blurb": "A cynical expatriate cafe owner struggles to choose between love and virtue in wartime Morocco"}
{ "index" : { "_index" : "film_index" } }
{"title": "2001: A Space Odyssey", "director": "Stanley Kubrick", "year": 1968, "runtime_min": 149, "genre": ["Sci-Fi", "Adventure"], "blurb": "Humanity finds a mysterious monolith on the moon that triggers a journey to Jupiter"}
{ "index" : { "_index" : "film_index" } }
{"title": "Parasite", "director": "Bong Joon-ho", "year": 2019, "runtime_min": 132, "genre": ["Thriller", "Drama"], "blurb": "A poor family schemes to become employed by a wealthy household with devastating consequences"}
{ "index" : { "_index" : "film_index" } }
{"title": "The Godfather", "director": "Francis Ford Coppola", "year": 1972, "runtime_min": 175, "genre": ["Crime", "Drama"], "blurb": "Aging patriarch of an organized crime dynasty transfers control to his reluctant son"}
{ "index" : { "_index" : "film_index" } }
{"title": "Inception", "director": "Christopher Nolan", "year": 2010, "runtime_min": 148, "genre": ["Sci-Fi", "Action"], "blurb": "A thief who enters people's dreams attempts to plant an idea in a CEO's subconscious"}
{ "index" : { "_index" : "film_index" } }
{"title": "The Grand Budapest Hotel", "director": "Wes Anderson", "year": 2014, "runtime_min": 99, "genre": ["Comedy", "Drama"], "blurb": "A legendary concierge teams up with a lobby boy to clear his name in a priceless painting theft"}

As the documents are indexed, drumroll please…. the Elasticsearch open inference API will call the jina_embeddings service to generate embeddings for the blurb text. Credits for this seamless developer experience go to the semantic_text type and Jina AI integration in Elasticsearch open inference API.

Step 4: Semantic reranking

Now, you can search film_index using semantic embedding vectors. The API Call below will

Create an embedding for the query string “An inspiring love story” using the jina_embeddings service.
Compare the resulting embedding to the ones stored in film_index.
Return the stored documents whose blurb fields best match the query.

GET film_index/_search 
{
  "query": {
    "semantic": {
      "field": "blurb",
      "query": "An inspiring love story"
    }
  }
}

Now, let’s use jina_rerank. It will perform the same query-matching procedure as the one above, then take the 50 best matches (specified by the rank_window_size field) and use the jina_rerank service to do a more precise ranking of the results, returning the top 10 (as specified in the configuration of jina-rerank previously.)

POST film_index/_search
{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "semantic": {
              "field": "blurb",
              "query": "An inspiring love story"
            }
          }
        }
      },
      "field": "blurb",
      "rank_window_size": 50,
      "inference_id": "jina_rerank",
      "inference_text": "An inspiring love story"
    }
  }
}

RAG with Elasticsearch and Jina AI

As developers use Elasticsearch for their RAG use cases, the ability to use Jina AI’s search foundations natively in the inference API provides low-cost and seamless access to Jina AI’s search foundations. Developers can use this integration today in Elastic Cloud Serverless, and it will soon be available in the 8.18 version of Elasticsearch. Thank you, Jina AI team, for the contribution!

Try this notebook with an end-to-end example of using Inference API with the Jina AI models.
To learn more about Jina AI models, visit jina.ai and blog.

Elasticsearch has native integrations with industry-leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.

To build the best search solutions for your use case, start a free cloud trial for a fully managed Elastic Cloud project or try Elastic on your local machine now in a few minutes with `curl -fsSL https://elastic.co/start-local | sh`

Ready to try this out on your own? Start a free trial.

Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our advanced semantic search webinar to build your next GenAI app!