Docs Menu
Docs Home
/
Database Manual
/ /

Create Selective Indexes to Answer Queries Efficiently

On this page

  • Examples

Selectivity is a query property that describes the ratio of documents matching the query versus the total number of documents in a collection. The selectivity of an index describes how many documents a unique index key matches. A query or index has high selectivity when proportionally few documents match a query or a given index key.

Because indexes can have different selectivities depending on the index keys used, ensure that the most selective indexes are available based on the predicates contained in a query. To ensure the most efficient query execution, create indexes that most uniquely match the predicates contained in a query.

Consider a collection of documents that have the following form:

{
status: "processed",
product_type: "electronics"
}

In this example, the status of 99% of documents in the collection is processed. If you add an index on status and query for documents with the status of processed, both the index and the query have low selectivity. However, if you want to query for documents that do not have the status of processed, the index and the query have high selectivity because the query only returns 1% of the documents in a collection.

Consider a collection of documents where the status field has three values distributed across the collection:

[
{ _id: ObjectId(), status: "processed", product_type: "electronics" },
{ _id: ObjectId(), status: "processed", product_type: "grocery" },
{ _id: ObjectId(), status: "processed", product_type: "household" },
{ _id: ObjectId(), status: "pending", product_type: "electronics" },
{ _id: ObjectId(), status: "pending", product_type: "grocery" },
{ _id: ObjectId(), status: "pending", product_type: "household" },
{ _id: ObjectId(), status: "new", product_type: "electronics" },
{ _id: ObjectId(), status: "new", product_type: "grocery" },
{ _id: ObjectId(), status: "new", product_type: "household" }
]

If you add an index on status and query for { "status": "pending", "product_type": "electronics" }, MongoDB must read three index keys, retrieve three documents matching that status, and filter those documents further on product_type to return the one matching document. Similarly, a query for { "status": {$in: ["processed", "pending"] }, "product_type" : "electronics" } must read six documents to return the two matching documents.

Consider the same index on a collection where status has nine values distributed across the collection:

[
{ _id: ObjectId(), status: 1, product_type: "electronics" },
{ _id: ObjectId(), status: 2, product_type: "grocery" },
{ _id: ObjectId(), status: 3, product_type: "household"},
{ _id: ObjectId(), status: 4, product_type: "electronics" },
{ _id: ObjectId(), status: 5, product_type: "grocery"},
{ _id: ObjectId(), status: 6, product_type: "household"},
{ _id: ObjectId(), status: 7, product_type: "electronics" },
{ _id: ObjectId(), status: 8, product_type: "grocery" },
{ _id: ObjectId(), status: 9, product_type: "household" }
]

If you query for { "status": 2, "product_type": "grocery" }, MongoDB only reads one document matching the index key, indicating the index is highly selective. By using this index, you can receive a query response more efficiently, since MongoDB must only further filter one document matching the index value. In this case, the filter also matches, and the query only returns one document.

Although this example's query on status equality is more selective, a query such as { "status": { $gt: 5 }, "product_type": "grocery" } still needs to read four documents if you use the same index on status. However, if you create a compound index on product_type and status, MongoDB can more efficiently answer a query for {"status": { $gt: 5 }, "product_type": "grocery" } via the compound index, as the query returns only one matching document.

To improve query performance, you can create a compound index that narrows the documents that queries read. For example, if you want to improve performance for queries on status and product_type, you could create a compound index on those two fields.

If MongoDB reads a relatively large number of documents to return results, some queries may perform faster without indexes. To determine performance, see Measure Index Use.

Back

Sort Query Results

On this page