Elastic and Tencent Cloud AI search technology make Dunhuang Digital Scripture Cave accessible to everyone

Ancient books in the Dunhuang Digital Scripture Cave have now become accessible to everyone online. The Dunhuang Research Institute officially released the Digital Scripture Cave database, including more than 9,900 volumes of Dunhuang documents and more than 60,700 images that contain Buddhist scriptures, legal codes, contracts, silk paintings, and more.
With the help of Elastic and Tencent Cloud AI search and large language model (LLM) technology, traditional Chinese and rare characters in ancient books have been awakened. They’ve become searchable and translatable instead of obscure and difficult to understand.
The Dunhuang Digital Scripture Cave has brought civilization deep into the caves of everyone's eyes.

Reinterpreting Dunhuang: AI makes ancient books understandable
In the past, approaching a volume of Dunhuang ancient books required flipping rubbings and reading annotations. Now, these books can be opened in the Dunhuang Digital Scripture Cave with the help of AI.
AI summary: Master the essence of a 10,000-word article in 10 seconds
The threshold for understanding ancient books is high. Take the Diamond Sutra for example. It has more than 5,000 words vertically arranged and in traditional Chinese.

But with the support of AI technology, obscure Buddhist scriptures become clear and readable. The Diamond Sutra homepage automatically extracts the summary of the vernacular version:
“The Buddha told Subhuti that all conditioned dharmas are like dreams and bubbles, and should not be attached to form. Bodhisattvas should have no self-image, human image, sentient beings image, or life image, and practice charity with a non-dwelling mind to achieve enlightenment. The true Dharma cannot be described in words. Only by transcending concepts and distinctions can the essence of Tathagata be seen.”

AI intelligently summarizes the background knowledge of the Diamond Sutra, introducing the historical status, core ideas, structural characteristics, and cultural influences.
Intelligent Q&A: AI assistant is online 24/7 to answer questions
The AI
In just a matter of seconds, it can answer questions like, “in which year was the Sutra Cave discovered?” or “How many characters are there in the Diamond Sutra?” After testing by the Dunhuang expert group, the answer accuracy rate is as high as 95%.

Multilingual translation: Switch between Chinese, English, French, and Japanese
The platform also supports translation between Chinese, English, French, Japanese, and other languages. In the AI assistant, users can input any four languages, and it will communicate with you in the same language — switching between Chinese, English, French, and Japanese without barriers. One more language means one more way for the world to understand Dunhuang.

AI search: Decode scriptures and understand ancient books
The transformation from thousand-year-old fragments to digital interpretation is based on a set of AI search technologies tailored for ancient books. Tencent uses Tencent Cloud Elasticsearch Service (ES) and LLM to build a combination of retrieval augmented generation (RAG) capabilities to provide fast and accurate responses to the needs of sorting and searching massive ancient book information.

Whenever a user enters a question, the system will find all the literature information related to this question and then use the powerful generation capabilities of the LLM to provide users with accurate answers.
Ancient text tokenization: Let AI understand the thousand-year jargon
Tokenization is a foundational technology for real-time retrieval of massive datasets. By decomposing text into semantically meaningful tokens, it ensures AI systems can interpret and process historical documents effectively.
Traditional tokenizers struggle with classical Chinese. For example, the phrase "佛在舍卫国祗树给孤独园" would be fragmented into individual characters, rendering it incomprehensible to both humans and machines.

The Tencent Cloud ES team addressed this challenge by developing the ancient Chinese tokenizer. In collaboration with Dunhuang experts, they refined tokenization rules for complex terminology, creating a specialized system tailored for Dunhuang manuscripts. This system transforms obscure texts into machine-readable tokens optimized for AI-driven semantic search.
Phrases like "佛在舍卫国祗树给孤独园 (Buddha in Savatthi Jetavana Grove)" are now accurately tokenized as “佛 / 在 / 舍卫国 / 祗树 / 给孤独园 ( Buddha / in / Savatthi / Jetavana Grove),” preserving contextual meaning.
Hybrid search: Find all relevant ancient books
After you ask a question, the Dunhuang Digital Scripture Cave will split into two approaches to find relevant documents in the knowledge base. For example, if you ask, "What does the Diamond Sutra talk about?", Tencent Cloud ES will start two searches at the same time:
Keyword scanning (full-text search): Accurately captures key information, such as "Diamond Sutra" and "Subhuti"
Semantic radar (vector search): Interprets "What does the Diamond Sutra talk about?" as a request for core themes and automatically finds similar abstract concepts, such as "breaking virgin obsession" and "no dwelling mind"
After analysis and intelligent rerank, the most relevant documents to your question are found and sent to the LLM for later use.
LLM integration: Dual-mode drive is more accurate
Tencent Cloud ES seamlessly integrates with Tencent's Hunyuan LLM and DeepSeek.
By combining user queries with contextual documents, these LLMs generate precise responses. This dual-model architecture enhances retrieval accuracy and ensures that generated answers are contextually relevant and factually grounded.
In the future, there will be more "Dunhuang" waiting to be gently illuminated.
Tencent Cloud Elasticsearch Service: A cloud-native, one-stop AI search service
The Tencent Cloud Elasticsearch Service is a one-stop AI search and log analysis service on the cloud. Through a strategic partnership with Elastic, Tencent Cloud Elasticsearch offers the commercial subscription of Elastic. It has advantages, such as a high-performance and self-developed kernel, one-stop data access and index management, intelligent inspection, and one-click upgrade — all of which can efficiently help users build massive data retrieval and analysis services.
The service also supports the serverless model, realizes on-demand payment, provides automatic elasticity, and is completely maintenance-free — greatly improving the user's cloud search experience. Tencent Cloud ES helped the Dunhuang project achieve the global launch of the Digital Sutra Cave, successfully promoting the digitization and global dissemination of cultural heritage.
Learn more about Tencent Cloud Elasticsearch Service and get started with a free trial.
The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.
In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.
Elastic, Elasticsearch, and associated marks are trademarks, logos, or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.