1 00:00:02,170 --> 00:00:09,670 Speaking of multikey indexes, there is a special kind of multikey index, a text index. 2 00:00:09,790 --> 00:00:16,870 Let's say we have this text stored in a field in our document, could be the product description of a 3 00:00:16,870 --> 00:00:18,220 product. 4 00:00:18,220 --> 00:00:24,670 Now if you want to search that text, we saw before that we can use the regex operator and that 5 00:00:24,670 --> 00:00:27,460 is however not a really great way of searching text, 6 00:00:27,480 --> 00:00:35,780 it offers a very low performance. Better is to use a text index and a text index is a special kind of 7 00:00:35,780 --> 00:00:42,590 index supported by mongodb which will essentially turn this text into an array of single words and 8 00:00:42,590 --> 00:00:43,940 it will store it as such, 9 00:00:43,940 --> 00:00:50,150 so it stores it essentially as if you had an array of these single words, one extra thing it does for 10 00:00:50,150 --> 00:00:58,970 you is it removes all the stop words and it stems all words, so that you have an array of keywords essentially 11 00:00:59,210 --> 00:01:05,450 and things like is or the or a are not stored there because that is typically something you don't search 12 00:01:05,450 --> 00:01:07,690 for because it's all over the place, 13 00:01:07,700 --> 00:01:11,060 the keywords are one matter for text searches typically. 14 00:01:11,420 --> 00:01:14,020 So this is the text index, 15 00:01:14,030 --> 00:01:16,560 now let's have a look at such a text index 16 00:01:16,580 --> 00:01:24,360 and for that I'll, whoops not index, insert many products into a newly created products collection here. 17 00:01:24,370 --> 00:01:27,280 So the first product will have a name or 18 00:01:27,280 --> 00:01:37,880 let's say a title to mix it up, title, a book and it will have a description, this is an awesome 19 00:01:37,910 --> 00:01:41,030 book about a young artist. 20 00:01:42,050 --> 00:01:53,350 The second document I add here should also describe product will have a title of red t-shirt and there, 21 00:01:53,890 --> 00:02:06,070 we'll have a, whoops we'll have a description of this t-shirt is red and it's pretty awesome. 22 00:02:10,410 --> 00:02:14,090 So after finding that missing quotation mark, it now worked fine for me 23 00:02:14,310 --> 00:02:20,380 and now if I have a look at my products and I pretty print this, here are my two products. 24 00:02:20,790 --> 00:02:22,700 Now let's use a text index, 25 00:02:22,710 --> 00:02:28,260 let's use it on the description field let's say. So first of all, I'll put our products again so that we 26 00:02:28,260 --> 00:02:29,190 can see them 27 00:02:29,400 --> 00:02:34,640 and now we can create a new text index, again with create index as we create all indexes 28 00:02:35,010 --> 00:02:38,110 and I say I want to create it on description and now important, 29 00:02:38,400 --> 00:02:40,100 don't add 1 or -1, 30 00:02:40,110 --> 00:02:47,460 you could do this but then it would simply index this as a single field index and you could then search 31 00:02:47,460 --> 00:02:54,010 for exactly this text to utilize the index but not for individual keywords. You need the text index 32 00:02:54,150 --> 00:02:56,410 so that mongodb splits this up, 33 00:02:56,460 --> 00:02:58,880 so using this will not work so 34 00:02:58,980 --> 00:03:07,580 let's quickly drop that index like this and let's recreate the index but now not with a 1 but with the 35 00:03:07,580 --> 00:03:14,270 special text keyword, text in quotation marks. This will create a text index which is a special kind of 36 00:03:14,270 --> 00:03:21,170 index where mongodb will go ahead and as I mentioned, remove all the stop words and store all the 37 00:03:21,170 --> 00:03:25,610 keywords in an array essentially, 38 00:03:25,730 --> 00:03:27,000 so let's have a look at this. 39 00:03:27,020 --> 00:03:29,230 This is the data we have in there in general 40 00:03:29,540 --> 00:03:33,690 and now let's use products and find and 41 00:03:33,830 --> 00:03:34,580 here 42 00:03:35,000 --> 00:03:38,260 let's now use the special $text key 43 00:03:39,370 --> 00:03:45,990 and search and for that, you pass a document as a value for $text and there, you need 44 00:03:46,000 --> 00:03:46,960 $search. 45 00:03:46,960 --> 00:03:52,000 Now you might be wondering why do I not need to specify the field in which I want to search, 46 00:03:52,000 --> 00:03:54,120 why don't I have to add description, 47 00:03:54,310 --> 00:03:57,290 instead we just add hey I want to search for some text. 48 00:03:57,340 --> 00:04:04,300 The reason for that is that you may only have one text index per collection because text indexes are 49 00:04:04,300 --> 00:04:06,610 pretty expensive as you can imagine, 50 00:04:06,610 --> 00:04:11,410 if you have a lot of long text that has to be split up, you don't want to do this like 10 times per 51 00:04:11,410 --> 00:04:16,080 collection and therefore, you only have one text index where this could look into. 52 00:04:16,240 --> 00:04:21,730 You can actually merge multiple fields into one text index as I will show you in a second and you will 53 00:04:21,730 --> 00:04:24,020 then look through all of them automatically 54 00:04:24,250 --> 00:04:25,990 but you can only do it like this, 55 00:04:25,990 --> 00:04:31,050 you can't say hey I want to search for text and description like this, this won't work. 56 00:04:31,090 --> 00:04:34,640 So now for search, we simply enter the words we want to look for 57 00:04:34,870 --> 00:04:38,480 like awesome and the casing is not important here by the way, 58 00:04:38,590 --> 00:04:41,380 everything is stored as lowercase. 59 00:04:41,380 --> 00:04:49,360 If I hit enter and I pretty print this, you'll see I find both products because in both products, we have 60 00:04:49,360 --> 00:04:51,150 the term awesome. 61 00:04:51,650 --> 00:04:53,090 Now let's repeat this 62 00:04:53,140 --> 00:04:56,540 but let's now just search for the term, book. 63 00:04:56,680 --> 00:05:00,960 So here I'll search for book 64 00:05:01,130 --> 00:05:06,780 and now you see I only get well this text or this document where we have book in the text, 65 00:05:06,800 --> 00:05:11,860 now what if I search for red book, I have red in the document. 66 00:05:12,170 --> 00:05:17,810 Well then I find both again because this is now actually not treated as one connected phrase where 67 00:05:17,810 --> 00:05:23,150 it would look for a red book but it simply looks for documents that have a red text or some red 68 00:05:23,180 --> 00:05:30,120 in a text and for documents that have book in a text. Of course you can also search for a specific phrase 69 00:05:30,120 --> 00:05:36,920 though, you can search for phrases by wrapping that phrase in double quotes and since we are in 70 00:05:36,920 --> 00:05:41,500 double quotes already, we have to escape them with a backslash double quote. 71 00:05:41,690 --> 00:05:46,580 We have this at the beginning and at the end of the phrase and now we don't find anything because 72 00:05:46,580 --> 00:05:51,380 we have no red book phrase anywhere in our text, for example 73 00:05:51,380 --> 00:05:53,210 awesome book would work though, 74 00:05:53,210 --> 00:05:58,250 so if I look for the awesome book phrase, we would find this document because we have awesome book right 75 00:05:58,250 --> 00:05:59,080 there. 76 00:05:59,090 --> 00:06:02,870 Now this is really powerful and much faster than regular expressions, 77 00:06:03,050 --> 00:06:07,450 so this is definitely the way to go if you need to look for keywords in text.