In any large catalog, from e-commerce products to article listings, duplicates are inevitable. While Elasticsearch's collapse feature is excellent for grouping these variants and presenting a clean UI, it introduces a critical challenge for pagination. The total hit count reflects the original number of documents, not the final number of unique groups, making it impossible to build a reliable pager. This post details the definitive pattern to solve this by pairing collapse with a cardinality aggregation to get the true result count.
Pagination with collapse and cardinality
In Elasticsearch, a frequent challenge in creating efficient search experiences, particularly in e-commerce, involves deduplication combined with pagination, the process of dividing a large set of data into smaller, manageable pages. For instance, a product catalog might include various versions (sizes, colors, SKUs) of the same product design.
Using the collapse feature, we can group hits by a single field to show a single representative result per group. This avoids flooding the UI with near-duplicates. Let’s walk through a practical example.
Example: Combining SKUs with collapse
Here are some example JSON documents that you can import through Kibana DevTools. These examples are designed to illustrate the scenario described in the article, where you have product variations (sizes and colors) that you want to group.
Explanation of the JSON Structure
Each document represents a specific product variant (SKU).
- product_id: A unique identifier for the core product. This is the field you will use for the collapse and cardinality aggregation.
- sku: The unique stock keeping unit for the specific variant.
- name: The general name of the product.
- color: The color of the product variant.
- size: The size of the product variant.
- price: The price of the specific variant.
- timestamp: A timestamp to allow for sorting and selecting the most recent version if needed.
To import the following data, you can copy/paste the lines in Kibana DevTools and press the play button that appears on the screen on the right side of the gray outlined block.
POST your-products-index/_bulk
{ "index" : { "_id" : "sku-1001" } }
{ "product_id": "P001", "sku": "sku-1001", "name": "Classic T-Shirt", "color": "White", "size": "S", "price": 19.99, "timestamp": "2025-06-23T10:00:00Z" }
{ "index" : { "_id" : "sku-1002" } }
{ "product_id": "P001", "sku": "sku-1002", "name": "Classic T-Shirt", "color": "White", "size": "M", "price": 19.99, "timestamp": "2025-06-23T10:01:00Z" }
{ "index" : { "_id" : "sku-1003" } }
{ "product_id": "P001", "sku": "sku-1003", "name": "Classic T-Shirt", "color": "White", "size": "L", "price": 19.99, "timestamp": "2025-06-23T10:02:00Z" }
{ "index" : { "_id" : "sku-1004" } }
{ "product_id": "P001", "sku": "sku-1004", "name": "Classic T-Shirt", "color": "Black", "size": "M", "price": 21.99, "timestamp": "2025-06-23T11:00:00Z" }
{ "index" : { "_id" : "sku-2001" } }
{ "product_id": "P002", "sku": "sku-2001", "name": "V-Neck Sweater", "color": "Navy", "size": "M", "price": 49.95, "timestamp": "2025-06-22T14:00:00Z" }
{ "index" : { "_id" : "sku-2002" } }
{ "product_id": "P002", "sku": "sku-2002", "name": "V-Neck Sweater", "color": "Navy", "size": "L", "price": 49.95, "timestamp": "2025-06-22T14:01:00Z" }
{ "index" : { "_id" : "sku-2003" } }
{ "product_id": "P002", "sku": "sku-2003", "name": "V-Neck Sweater", "color": "Grey", "size": "S", "price": 45.95, "timestamp": "2025-06-22T15:00:00Z" }
{ "index" : { "_id" : "sku-3001" } }
{ "product_id": "P003", "sku": "sku-3001", "name": "Running Shorts", "color": "Blue", "size": "32", "price": 35.00, "timestamp": "2025-06-21T09:30:00Z" }
{ "index" : { "_id" : "sku-3002" } }
{ "product_id": "P003", "sku": "sku-3002", "name": "Running Shorts", "color": "Red", "size": "34", "price": 35.00, "timestamp": "2025-06-21T09:31:00Z" }
Now we can run the search and collapse on the product_id.keyword field to get the results back:
GET your-products-index/_search
{
"query": {
"match_all": {}
},
"collapse": {
"field": "product_id.keyword"
}
}
And this is how the results look like:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 9,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "your-products-index",
"_id": "sku-1003",
"_score": 1,
"_source": {
"product_id": "P001",
"sku": "sku-1003",
"name": "Classic T-Shirt",
"color": "White",
"size": "L",
"price": 19.99,
"timestamp": "2025-06-23T10:02:00Z"
},
"fields": {
"product_id.keyword": [
"P001"
]
}
},
{
"_index": "your-products-index",
"_id": "sku-2001",
"_score": 1,
"_source": {
"product_id": "P002",
"sku": "sku-2001",
"name": "V-Neck Sweater",
"color": "Navy",
"size": "M",
"price": 49.95,
"timestamp": "2025-06-22T14:00:00Z"
},
"fields": {
"product_id.keyword": [
"P002"
]
}
},
{
"_index": "your-products-index",
"_id": "sku-3001",
"_score": 1,
"_source": {
"product_id": "P003",
"sku": "sku-3001",
"name": "Running Shorts",
"color": "Blue",
"size": "32",
"price": 35,
"timestamp": "2025-06-21T09:30:00Z"
},
"fields": {
"product_id.keyword": [
"P003"
]
}
}
]
}
}
However, there’s an important caveat:
The hits.total.value
returned still reflects the total number of documents, not the number of unique groups after collapsing. This means relying on hits.total.value
for pagination breaks the UX and misrepresents the total number of result pages. If the UI is showing 5 results, it would have page 2. But loading page 2 would fail because there are only 3 documents instead of 9.
Solution: Combine collapse with cardinality
By adding a cardinality aggregation on the same collapse field, we can accurately compute the number of distinct groups, enabling reliable and predictable pagination.
Here’s an example query:
{
"query": {
"match_all": {}
},
"collapse": {
"field": "product_id.keyword"
},
"aggs": {
"total_uniques": {
"cardinality": {
"field": "product_id.keyword"
}
}
},
"sort": [
{"timestamp": {"order": "desc"}}
],
"size": 5,
"from": 0
}
Conclusion
By now, the challenge of pagination with collapsed search results should be clear, as should the definitive solution. Relying on the default hits.total.value when using the collapse feature inevitably leads to a broken user experience, displaying incorrect page counts and frustrating users.
The key takeaway is the robust pattern of pairing the collapse query with a cardinality aggregation on the very same field.
- collapse: Deduplicates results at query-time by grouping on collapseField.
- cardinality: Returns an approximate count of unique collapseField values, essential for paginating over grouped results.
- sort + from: Still respected, but applied after collapsing.
hits.total.value
: Will reflect total documents—not deduplicated ones. Don't use this for pagination in collapsed queries.
Ready to try this out on your own? Start a free trial.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!
Related content

June 26, 2025
Building an MCP server with Elasticsearch for real health data
Learn learn how to build an MCP server using FastMCP and Elasticsearch to manage and search data.

Ruby scripting in Logstash
Learn about the Logstash Ruby filter plugin for advanced data transformation in your Logstash pipeline.

June 19, 2025
ECK made simple: Deploying Elasticsearch on GCP GKE Autopilot
Learn how to deploy an Elasticsearch cluster on GCP using GKE Autopilot and ECK.

June 16, 2025
Elasticsearch open inference API adds support for IBM watsonx.ai rerank models
Exploring how to use IBM watsonx™ reranking when building search experiences in the Elasticsearch vector database.

June 13, 2025
Using Azure LLM Functions with Elasticsearch for smarter query experiences
Try out the example real estate search app that uses Azure Gen AI LLM Functions with Elasticsearch to provide flexible hybrid search results. See step-by-step how to configure and run the example app in GitHub Codespaces.