Full text search is the ability to efficiently search
strings within strings. It is
similar to finding keyword in a large text. MongoDB uses a text index
and the $text operator to perform text search.
In MongoDB text search assigns a score to each document that
contains the search term in the indexed fields. This score determines the
relevance of a document to a given search query.
When we have text index on more than one field, sometimes
one field will be more important than the other. So we can specify the
importance of a field by specifying weight.
For a text index, the weight of an
indexed field denotes the significance of the field relative to the other
indexed fields in terms of the text search score. The default weight for every
indexed field is 1.
Using the example collection from MongoDB full text search documents,
I created the stores collection. The sample data of the collection shown below:
Now I created a text index specifying weights:
In the created index
- name field has a weight of 10 and
- description field has a weight of 5
So the match in the name field has 2 times (i.e. 10:5) the
impact as a match in the description field.
So how the score is calculated for a document
in search query?
As per MongoDB document
• For
each indexed field in the document, MongoDB multiplies the number of matches by
the weight and sums the results.
• Using
this sum, MongoDB then calculates the score for the document.
To check the score, we can use the $meta
operator:
When I saw
the score associated with a document, I was really confused and curious to know
how actually the score is calculated. But reading the above MongoDB score
calculation points it was not clear.
I searched and
found one answer in Google Group for mongodb-user (https://groups.google.com/forum/#!topic/mongodb-user/99t5WXmUUAg)
which threw some light on the score calculation.
As per my
understanding, I have arrived the following points for the score calculation:
In MongoDB
we have a weight coefficient which adjusts the score, the code line from
MongoDB source is:
• double
coeff = (0.5 * data.count / numTokens) + 0.5;
• data.count
-> number of matches
• numTokens
-> number of items we matched after stemming and removing stop words
score =
coeff * weight
• weight -> Weight for an indexed
field, default 1
I will show
two examples of score calculation.
Example 1:
Here we are searching for the word Burger and one document was found with score 7.5. Here the Burger
word is present in the name field only.
• Number of matches in the document: 1
• Number of tokens: 2 (Burger, Buns), after
stemming and removing stop words
• Weight for name field: 10
coeff = (0.5 * data.count / numTokens) + 0.5
= (0.5 * 1/2) + 0.5
= 0.75
score =
coeff * weight
= 0.75 * 10
= 7.5
Example 2:
Here we are searching for the word Samosa and one document
was found with score 10.625. Here the Samosa word was present both in name as
well as in description fields.
• Match for name field:
– Number of matches: 1
– Number of tokens: 2 (Samosa, Tea), after
stemming and removing stop words
– Weight for name field: 10
coeff = (0.5 * data.count / numTokens) + 0.5
=
(0.5 * 1/2) + 0.5
= 0.75
Score for name field = coeff * weight
= 0.75 * 10
= 7.5
• Match for description field:
– Number of matches: 1
– Number of tokens: 4 (Hot, Samosa,
hot, tea), after stemming and removing stop words
– Weight for description field: 5
coeff = (0.5 * data.count / numTokens) + 0.5
=
(0.5 * 1/4) + 0.5
= 0.625
Score for description field = coeff * weight
= 0.625 * 5
= 3.125
Total score = Score for name + Score for
description
= 7.5 + 3.125
= 10.625
4 comments:
Your blog is in a convincing manner, thanks for sharing such an information with lots of your effort and time mongodb online training
Get valuable information about the web hosting service and top technical support. I got great information for the VPS Server Hosting.
Germany VPS Server Hosting
Truly sensible message, man. I just ran over your blog and expected to pass on that I genuinely regarded the experience of dismantling your grumblings. At any rate I will purchase in to your feed, and I study that you'll post it again soon. The HP Printer Error Code 0xc05d1281 occurs due to ink system failure. Contact us our helpline number to fix the error.
Combining our MongoDB experts with MongoDB's innate features, schema-less architecture, sharding, and replication, we will optimize your database to fully utilize its powerful capabilities & provide you a strong foundation to deploy highly available and massively scalable database-driven applications.
https://genexdbs.com/
Post a Comment