Pranab's scrapbook: MongoDB text search score calculation

Wednesday, 6 February 2019

MongoDB text search score calculation

Full text search is the ability to efficiently search strings within strings. It is similar to finding keyword in a large text. MongoDB uses a text index and the $text operator to perform text search.

In MongoDB text search assigns a score to each document that contains the search term in the indexed fields. This score determines the relevance of a document to a given search query.

When we have text index on more than one field, sometimes one field will be more important than the other. So we can specify the importance of a field by specifying weight.

For a text index, the weight of an indexed field denotes the significance of the field relative to the other indexed fields in terms of the text search score. The default weight for every indexed field is 1.

Using the example collection from MongoDB full text search documents, I created the stores collection. The sample data of the collection shown below:

Now I created a text index specifying weights:

In the created index

name field has a weight of 10 and
description field has a weight of 5

So the match in the name field has 2 times (i.e. 10:5) the impact as a match in the description field.

So how the score is calculated for a document in search query?

As per MongoDB document

• For each indexed field in the document, MongoDB multiplies the number of matches by the weight and sums the results.

• Using this sum, MongoDB then calculates the score for the document.

To check the score, we can use the $meta operator:

When I saw the score associated with a document, I was really confused and curious to know how actually the score is calculated. But reading the above MongoDB score calculation points it was not clear.

I searched and found one answer in Google Group for mongodb-user (https://groups.google.com/forum/#!topic/mongodb-user/99t5WXmUUAg) which threw some light on the score calculation.

As per my understanding, I have arrived the following points for the score calculation:

In MongoDB we have a weight coefficient which adjusts the score, the code line from MongoDB source is:

• From https://github.com/mongodb/mongo/blob/v4.0/src/mongo/db/fts/fts_spec.cpp file.

• double coeff = (0.5 * data.count / numTokens) + 0.5;

• data.count -> number of matches

• numTokens -> number of items we matched after stemming and removing stop words

score = coeff * weight

• weight -> Weight for an indexed field, default 1

I will show two examples of score calculation.

Example 1:

Here we are searching for the word Burger and one document was found with score 7.5. Here the Burger word is present in the name field only.

• Number of matches in the document: 1

• Number of tokens: 2 (Burger, Buns), after stemming and removing stop words

• Weight for name field: 10

coeff = (0.5 * data.count / numTokens) + 0.5

= (0.5 * 1/2) + 0.5

= 0.75

score = coeff * weight

= 0.75 * 10

= 7.5

Example 2:

Here we are searching for the word Samosa and one document was found with score 10.625. Here the Samosa word was present both in name as well as in description fields.

• Match for name field:

– Number of matches: 1

– Number of tokens: 2 (Samosa, Tea), after stemming and removing stop words

– Weight for name field: 10

coeff = (0.5 * data.count / numTokens) + 0.5

= (0.5 * 1/2) + 0.5

= 0.75

Score for name field = coeff * weight

= 0.75 * 10

= 7.5

• Match for description field:

– Number of matches: 1

– Number of tokens: 4 (Hot, Samosa, hot, tea), after stemming and removing stop words

– Weight for description field: 5

coeff = (0.5 * data.count / numTokens) + 0.5

= (0.5 * 1/4) + 0.5

= 0.625

Score for description field = coeff * weight

= 0.625 * 5

= 3.125

Total score = Score for name + Score for description

= 7.5 + 3.125

= 10.625

4 comments:

sneha said...: Your blog is in a convincing manner, thanks for sharing such an information with lots of your effort and time mongodb online training; 1 June 2020 at 08:57
Onlive Server said...: Get valuable information about the web hosting service and top technical support. I got great information for the VPS Server Hosting.
Germany VPS Server Hosting; 5 August 2020 at 20:41
Michael Smith said...: Truly sensible message, man. I just ran over your blog and expected to pass on that I genuinely regarded the experience of dismantling your grumblings. At any rate I will purchase in to your feed, and I study that you'll post it again soon. The HP Printer Error Code 0xc05d1281 occurs due to ink system failure. Contact us our helpline number to fix the error.; 22 June 2021 at 12:52
Aswi said...: Combining our MongoDB experts with MongoDB's innate features, schema-less architecture, sharding, and replication, we will optimize your database to fully utilize its powerful capabilities & provide you a strong foundation to deploy highly available and massively scalable database-driven applications.

https://genexdbs.com/; 17 December 2021 at 17:17

Pranab's scrapbook

CopyDisable

Wednesday, 6 February 2019

MongoDB text search score calculation

4 comments:

Post a Comment