1 00:00:02,510 --> 00:00:07,250 Now we had a look at single field and compound indexes and what we can do with them and how we can 2 00:00:07,250 --> 00:00:08,610 measure the efficiency, 3 00:00:08,750 --> 00:00:11,460 all super important things. 4 00:00:11,480 --> 00:00:18,830 Now let me introduce two new types of indexes and the first one is the so-called multikey index. 5 00:00:18,840 --> 00:00:26,090 For this, I made sure that I dropped my contacts collection, so that huge collection which we imported 6 00:00:26,090 --> 00:00:27,940 before, I already did drop it, 7 00:00:27,950 --> 00:00:29,360 hence I see false here 8 00:00:30,150 --> 00:00:38,630 and now I will add a new value here, contacts insertOne and that new value, 9 00:00:38,630 --> 00:00:39,360 am I 10 00:00:39,530 --> 00:00:44,050 and here I will add hobbies which happens to be an array, an array of strings, 11 00:00:44,220 --> 00:00:48,780 let's say cooking, sports. 12 00:00:48,800 --> 00:00:53,940 I'll also add another array, addresses, let's say I have multiple ones, 13 00:00:53,940 --> 00:00:56,470 this will be an array and each array is a document there, 14 00:00:56,600 --> 00:01:01,470 so embedded documents in an array. There to keep it short and simple, 15 00:01:01,490 --> 00:01:11,490 I'll just have the street, Main Street, add another document with the street, let's say second street, 16 00:01:11,570 --> 00:01:13,030 doesn't really matter. 17 00:01:13,130 --> 00:01:14,210 If I hit enter here, 18 00:01:14,450 --> 00:01:18,330 let's quickly have a look at what I inserted, 19 00:01:18,350 --> 00:01:19,300 this is looking good, 20 00:01:19,490 --> 00:01:26,620 I got Max, two hobbies which are just strings and two addresses which are documents. Now 21 00:01:26,660 --> 00:01:30,230 let's index an array because that is possible in 22 00:01:30,260 --> 00:01:31,630 mongodb too. 23 00:01:31,790 --> 00:01:39,370 For this I'll of course create an index again as we did it before and this time, I will index hobbies 24 00:01:39,380 --> 00:01:42,850 and if you remember, hobbies was an array right, 25 00:01:42,930 --> 00:01:45,530 you see it here, it was an array of strings. 26 00:01:45,630 --> 00:01:48,090 Let's hit enter and this works, 27 00:01:48,120 --> 00:01:50,120 now let's try using that 28 00:01:50,550 --> 00:01:59,870 and first of all, let's execute a query where I look for hobbies being equal to sports and I do find 29 00:01:59,870 --> 00:02:03,040 something, I can pretty print that to make it easier to read, 30 00:02:03,050 --> 00:02:07,360 I do find Max of course because I do have sports in there. Now 31 00:02:07,400 --> 00:02:18,590 the interesting thing is if I explain this with execution stats, I see that the winning plan was an 32 00:02:18,590 --> 00:02:29,100 index scan and we see that there, isMultiKey is set to true for my hobbies index. Mongodb treats 33 00:02:29,100 --> 00:02:36,170 this as a multikey index because it is an index on an array of values and technically, multikey indexes 34 00:02:36,260 --> 00:02:40,610 are working like normal indexes but they are stored up differently. 35 00:02:40,660 --> 00:02:47,460 What mongodb does is it pulls out all the values in your index key, 36 00:02:47,480 --> 00:02:53,020 so in hobbies in my case here, so it pulls out all the values in the array I stored in there and stores 37 00:02:53,040 --> 00:02:56,110 them as separate elements in an index. 38 00:02:56,120 --> 00:03:01,970 This of course means that multikey indexes for a lot of documents are bigger than single field indexes 39 00:03:02,240 --> 00:03:08,610 because if every document has an array with let's say four values on average and you have a thousand 40 00:03:08,630 --> 00:03:15,650 documents and that array field is what you index, you would store four thousand elements because four 41 00:03:15,650 --> 00:03:17,180 times one thousand. 42 00:03:17,750 --> 00:03:19,640 So this is something to keep in mind, 43 00:03:19,670 --> 00:03:23,050 multikey indexes are possible but typically are also bigger, 44 00:03:23,150 --> 00:03:24,440 doesn't mean you shouldn't use them. 45 00:03:24,500 --> 00:03:28,350 If you typically query for an array and the value in an array, 46 00:03:28,430 --> 00:03:33,790 well then it makes sense to turn this array into a multikey index, that is perfectly fine. 47 00:03:34,190 --> 00:03:38,950 So now we learned that we can use a multikey index and we are using one on hobbies, 48 00:03:38,960 --> 00:03:47,420 now let's also add one on addresses, contacts, create index and there, I'll index addresses, 49 00:03:47,420 --> 00:03:49,370 one. Now this also worked, 50 00:03:49,370 --> 00:03:51,770 it created that index 51 00:03:51,770 --> 00:03:53,760 and now let's try to utilize that. 52 00:03:54,110 --> 00:04:01,370 Let's reach out to contacts and let's explain what's happening with execution stats, so that we can see 53 00:04:01,370 --> 00:04:03,080 if that index gets used 54 00:04:03,380 --> 00:04:18,540 and let's find all addresses where the street is Main Street. If I hit enter here, we see it actually used 55 00:04:18,550 --> 00:04:23,040 a collection scan and not our index. 56 00:04:23,060 --> 00:04:29,810 The reason for that is that our index of course holds the whole documents and not the fields of the 57 00:04:29,810 --> 00:04:30,500 documents, 58 00:04:30,610 --> 00:04:37,310 so mongodb does not go so far to pull out the elements of an array and then pull out all field 59 00:04:37,310 --> 00:04:39,280 values of a nested document 60 00:04:39,290 --> 00:04:40,960 that array might hold. 61 00:04:41,180 --> 00:04:48,310 So our index would only get used if I were not looking for the street in the addresses 62 00:04:48,380 --> 00:04:55,650 I have in that array but if I were looking for addresses holding some document, 63 00:04:55,700 --> 00:05:04,130 so if I were looking for addresses where street is Main Street, if I were looking for that, you see it 64 00:05:04,130 --> 00:05:10,330 would have used an index scan because it is the whole document which is in our index in the end. Mongodb 65 00:05:10,430 --> 00:05:16,340 pulls out the elements of the array for addresses as single element in the array happens to be 66 00:05:16,340 --> 00:05:17,100 a document, 67 00:05:17,120 --> 00:05:22,340 so that document is what mongodb pulled out and what mongodb then stored in the index 68 00:05:22,430 --> 00:05:29,810 registry and therefore this is something to be aware. What you can do though is you can create an index 69 00:05:30,590 --> 00:05:33,240 on addresses.street, 70 00:05:33,370 --> 00:05:40,130 this also will be a multikey index as you can see if I now repeat my earlier query where I was looking 71 00:05:40,130 --> 00:05:41,430 for addresses street, 72 00:05:41,510 --> 00:05:43,560 if I repeat that query now, 73 00:05:43,940 --> 00:05:48,450 now you see it uses an index scan for this and it is a multikey index, 74 00:05:48,620 --> 00:05:56,040 so you can also use an index on a field in an embedded document which is part of an array 75 00:05:56,090 --> 00:06:03,200 with that multikey feature. You should just be aware that of course using multi multikey features 76 00:06:03,200 --> 00:06:07,750 on a single collection will quickly lead to some performance issues with rights 77 00:06:07,820 --> 00:06:13,390 because for every new document you add, all these multikey indexes have to be updated 78 00:06:13,430 --> 00:06:19,400 and if you add a new document with 10 values in that array which you happen to store in a multikey 79 00:06:19,400 --> 00:06:23,730 index, then these 10 new entries need to be added to the index registry 80 00:06:23,810 --> 00:06:29,600 and if you then have four or five of these multikey indexes per document, well then you quickly are 81 00:06:29,720 --> 00:06:31,560 in a low performance world. 82 00:06:31,700 --> 00:06:39,050 Still multikey index is super helpful if you have queries that regularly target array values or even 83 00:06:39,140 --> 00:06:43,850 nested values or values in an embedded document in arrays. 84 00:06:44,000 --> 00:06:50,700 There are a couple of restrictions or one important restriction to be precise when using multikey indexes 85 00:06:50,960 --> 00:06:58,010 and that is that if you add an index, a multikey index and you add it as part of a compound index, that 86 00:06:58,010 --> 00:06:59,480 is generally possible, 87 00:06:59,570 --> 00:07:06,830 I can add name in ascending order and name is not an array, it's just a single field and I can then add 88 00:07:06,830 --> 00:07:08,280 hobbies like this, 89 00:07:08,330 --> 00:07:15,350 this works. However what won't work is that I say I want have a compound index made up of two or more 90 00:07:15,410 --> 00:07:16,900 multikey indexes, 91 00:07:17,000 --> 00:07:23,840 so addresses one and hobbies one will not work because you can't index parallel arrays. 92 00:07:23,990 --> 00:07:25,870 The reason for that is simple, 93 00:07:26,000 --> 00:07:33,960 mongodb would have to store the cartesian product of the values of both indexes, of both arrays, 94 00:07:34,070 --> 00:07:40,130 so it would have to pull out all the addresses and for every address, it would have to store all the 95 00:07:40,130 --> 00:07:40,690 hobbies. 96 00:07:40,790 --> 00:07:46,790 So if you have two addresses and five hobbies, you already have to store 10 values and that of course 97 00:07:46,790 --> 00:07:50,410 becomes even worse the more values you have in addresses, 98 00:07:50,450 --> 00:07:53,300 so that is why this is not possible. 99 00:07:53,330 --> 00:08:00,860 Compound indexes with multikey indexes are possible as you see here but only with one multikey index, 100 00:08:00,880 --> 00:08:07,580 so with one array, not with multiple arrays. You can have multiple multikey indexes as you can see here 101 00:08:07,790 --> 00:08:09,210 in separate indexes 102 00:08:09,320 --> 00:08:13,670 but in one and the same index, only one array can be included.