CopyDisable

Saturday, 21 May 2016

Index Filters in MongoDB

MongoDB query optimizer processes queries and pick out the most efficient query plan for a query. MongoDB query system uses this plan each time the query runs. MongoDB optimizer chooses the optimal index (if indexes are available) for a query.
MongoDB optimizer works very well but sometimes we may have a better idea of which index to use for a given query. We can run the hint() method on a query to override query optimizer’s index selection process and tell the system which index should be used for the given query. So we have to specify hint() method from client side every time we want to override the index selection process. Sometimes we may have better idea about a query and the index to be used for that query and also we don’t want end user to override the index selection process by providing hint(). For all these the solution is Index Filters.
Index filter provides us a temporary (index filter do not persist after shutdown) way to inform MongoDB that a particular query type should use particular index. It determines which indexes the optimizer evaluates for a query shape (a query shape consists of the query itself, any sort criteria and any projection specifications). So if we have specified an index filter for a specific query type, then we don’t have add hint() to the same query. Also hint() is ignored by MongoDB when index filter exists for the particular query shape.
I will show one example to clarify the concept.
I have one collection users, having following data:  
{ "userID" : 1001, "name" : "Pranab Sharma", "city" : "Mumbai", "favFood" : "Chinese", "favDrink" : "beer" }
{ "userID" : 1002, "name" : "Danish Khan", "city" : "Guwahati", "favFood" : "Chinese", "favDrink" : "beer" }
{ "userID" : 1003, "name" : "Samir Das", "city" : "Mumbai", "favFood" : "Continental", "favDrink" : "milk" }
{ "userID" : 1004, "name" : "John Butler", "city" : "Mumbai", "favFood" : "Indian", "favDrink" : "vodka" }
{ "userID" : 1005, "name" : "Xi Xen", "city" : "Guwahati", "favFood" : "Chinese", "favDrink" : "wine" }
{ "userID" : 1006, "name" : "Vladimir Pulaxy", "city" : "Guwahati", "favFood" : "Chinese", "favDrink" : "beer" }
{ "userID" : 1007, "name" : "Karina Ali", "city" : "Mumbai", "favFood" : "Mexican", "favDrink" : "beer" }

This collection has two user defined indexes:
  • { "userID" : 1, "favDrink" : 1  }
  • { "userID" : 1, "city" : 1 }
Suppose we have a query which finds out the users having userID greater than equal to 1003, loves to drink beer and then sorts the result by the city field.
db.users.find({"userID" : {"$gte": 1003}, "favDrink": "beer"}).sort({"city": 1})
After running the query, if we check the execution stats of the query in MongoDB’s log (set the log level to 1 to get execution stats of the query using the command:  db.adminCommand({setParameter:1, logLevel:1} ) :
[conn1] command test.users command: find { find: "users", filter: { userID: { $gte: 1003.0 }, favDrink: "beer" }, sort: { city: 1.0 } } planSummary: IXSCAN { userID: 1.0, city: 1.0 } keysExamined:5 docsExamined:5 hasSortStage:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:2 reslen:342
From the above log line, we can see that the query optimizer has choosen the index { "userID" : 1, "city" : 1 } for our query. It scanned 5 index entries and 5 documents scanned.
Suppose, we know that if we use the index {"userID":1, "favDrink": 1 } for this query, then the system will require less number of document scanning. Let’s run the query providing this index in hint():
db.users.find({"userID" : {"$gte": 1003}, "favDrink": "beer"}).sort({"city": 1}).hint({"userID":1, "favDrink": 1 })
Now the execution stats for this query:
[conn1] command test.users command: find { find: "users", filter: { userID: { $gte: 1003.0 }, favDrink: "beer" }, sort: { city: 1.0 }, hint: { userID: 1.0, favDrink: 1.0 } } planSummary: IXSCAN { userID: 1.0, favDrink: 1.0 } keysExamined:5 docsExamined:2 hasSortStage:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:2 reslen:342
This time MongoDB scanned 2 documents, 3 documents less than our previous attempt without hint().
Say, we want to enforce the use of the index {"userID":1, "favDrink": 1 }  for our above query. So we can set an index filter Smile, instead of informing everyone to use the index {"userID":1, "favDrink": 1 } in hint().

To create an index filter, we can use the command planCacheSetFilter. This command has the follwing syntax:
db.runCommand(
   {
      planCacheSetFilter: <collection>,
      query: <query>,
      sort: <sort>,
      projection: <projection>,
      indexes: [ <index1>, <index2>, ...]
   }
)

So for our example, the command will be:
db.runCommand(
{
     planCacheSetFilter: "users",
     query: {"userID" : {"$gte": 1003}, "favDrink": "beer"},
     sort: {"city": 1},
     projection: {},
     indexes: [{"userID":1, "favDrink": 1 }]
} )


image
Our index filter is in place, now let’s run the query without hint():
db.users.find({"userID" : {"$gte": 1003}, "favDrink": "beer"}).sort({"city": 1})
image
Checking the execution status of the query:
[conn1] command test.users command: find { find: "users", filter: { userID: { $gte: 1003.0 }, favDrink: "beer" }, sort: { city: 1.0 } } planSummary: IXSCAN { userID: 1.0, favDrink: 1.0 } keysExamined:5 docsExamined:2 hasSortStage:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:2 reslen:342
Yes, we can see MongoDB used the index {"userID":1, "favDrink": 1 } for our query (as we specified in our index filter), so our index filter worked.


To check whether MongoDB really applied an index filter for our query we can use the explain() method and check the indexFilterSet field. If it is set to true, that means MongoDB had applied index filter.
image

If we can change the comparision value for userID field’s $gte and for favDrink, then also our index filter will work.
Let’s examine:
db.users.find({"userID" : {"$gte": 1001}, "favDrink": "wine"}).sort({"city": 1})
image
[conn1] command test.users command: find { find: "users", filter: { userID: { $gte: 1001.0 }, favDrink: "wine" }, sort: { city: 1.0 } } planSummary: IXSCAN { userID: 1.0, favDrink: 1.0 } keysExamined:6 docsExamined:1 hasSortStage:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:221
We can see that MongoDB used the index that we specified in the index filter definition. As changing the value does not change the query shape, so MongoDB used that index filter.
But if we change $gte to say $gt or $lt then our index filter will not work and MongoDB will again use the index { "userID" : 1, "city" : 1 }.
Let’s examine:
db.users.find({"userID" : {"$gt": 1003}, "favDrink": "beer"}).sort({"city": 1})
command test.users command: find { find: "users", filter: { userID: { $gt: 1003.0 }, favDrink: "beer" }, sort: { city: 1.0 } } planSummary: IXSCAN { userID: 1.0, city: 1.0 } keysExamined:4 docsExamined:4 hasSortStage:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:2 reslen:342
From above log, we can see that MongoDB used the default index { userID: 1, city: 1 }, as changing $gte to $gt changed the query shape.


Now let’s examine if providing a hint for a query works, if we have index filter in place for that query:
 db.users.find({"userID" : {"$gte": 1001}, "favDrink": "wine"}).sort({"city": 1}).hint({ userID: 1, city: 1 })
We provided the { userID: 1, city: 1 } index as a hint to our query.
[conn1] command test.users command: find { find: "users", filter: { userID: { $gte: 1001.0 }, favDrink: "wine" }, sort: { city: 1.0 }, hint: { userID: 1.0, city: 1.0 } } planSummary: IXSCAN { userID: 1.0, favDrink: 1.0 } keysExamined:6 docsExamined:1 hasSortStage:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:221
From the log, it is clear that MongoDB used the { userID: 1, favDrink: 1 } index, it did not consider the index { userID: 1, city: 1 } that we specified in the hint.


We can use the planCacheListFilters command to get the list of index filters for a given collection:
db.runCommand( { planCacheListFilters : "users" })
image

Also we can run the planCacheClearFilters command to remove a specific index filter or all the index filters in a collection. To remove a specific index filter we have to specify the query shape. For our example:
db.runCommand(
{
    planCacheClearFilters: "users",
    "query" : {"userID" : {"$gte" : 1003},"favDrink" : "beer"},
    "sort": {"city" : 1},
    "projection": {}
} )
image
To clear all index filters on a collection, just omit the query shape in the planCacheClearFilters command:
db.runCommand({planCacheClearFilters: "users"})
image
Index filter is a very nice tool for optimizing MongoDB experience, so try it out and enjoy Thumbs up.