1 00:00:02,190 --> 00:00:04,550 Now we had our first look at indexes, 2 00:00:04,890 --> 00:00:12,290 now let's again look at a single document in our contacts database and we created an index for the age 3 00:00:12,290 --> 00:00:12,930 here, 4 00:00:12,960 --> 00:00:16,390 now we dropped that in the last lecture but previously we did. 5 00:00:16,410 --> 00:00:19,450 Now of course you can not just create indexes on fields that have a number, 6 00:00:19,620 --> 00:00:25,200 you can also create indexes on fields that have some text and that makes a lot of sense because well, 7 00:00:25,230 --> 00:00:31,800 text is also something you regularly search for and it can perfectly be sorted. Booleans on the other hand are something 8 00:00:31,820 --> 00:00:35,090 that don't make too much sense because since you only got two kinds of values, 9 00:00:35,100 --> 00:00:36,280 true or false, 10 00:00:36,480 --> 00:00:40,170 chances are that the index will not speed up your queries too much. 11 00:00:40,200 --> 00:00:44,870 Now let's create a second index or actually a first one because I dropped one 12 00:00:45,150 --> 00:00:53,320 and let's now use a text index and for that, I'll try creating an index on the gender, male here. 13 00:00:54,610 --> 00:01:03,820 I'll do this on my collection with create index and I'll look for the gender field to be sorted ascendingly. 14 00:01:03,830 --> 00:01:08,140 Now of course one problem you might already think of is that for gender, 15 00:01:08,240 --> 00:01:13,060 just like for a boolean, we typically only have two values and indeed 16 00:01:13,250 --> 00:01:24,800 if I do run explain command with execution stats on my contacts and I try to find all contacts where 17 00:01:24,800 --> 00:01:30,520 gender is equal to male, I should get back a lot of keys right, 18 00:01:30,550 --> 00:01:35,770 I should get back like half the documents because this random collection of persons is pretty evenly 19 00:01:35,770 --> 00:01:36,570 distributed 20 00:01:36,670 --> 00:01:41,070 and indeed of my 5000 documents, 2400 are male. 21 00:01:41,350 --> 00:01:46,900 So this again is probably an example for an index that doesn't make too much sense or that doesn't really 22 00:01:46,930 --> 00:01:52,570 speed up our queries because we fetched a majority or a large chunk of our collection every time, 23 00:01:52,570 --> 00:01:54,990 still we use the index here of course. 24 00:01:55,120 --> 00:02:00,220 This also tells us something about the winning plans and rejected plans by the way, you see that mongodb 25 00:02:00,220 --> 00:02:05,040 didn't seem to consider a collection scan because we have never checked the plans, 26 00:02:05,070 --> 00:02:12,570 so for the rejected plans, it also looks at other indexes but not really at the collection scan as an alternative. 27 00:02:12,640 --> 00:02:18,880 So this index alone doesn't make too much sense but there is another kind of index we can use because 28 00:02:18,880 --> 00:02:26,670 let's say we want to find all persons who are older than 30 and male or older than 40 and male 29 00:02:26,830 --> 00:02:30,370 and for this, we can create a so-called compound index. 30 00:02:30,370 --> 00:02:31,690 Now let me drop this index 31 00:02:31,690 --> 00:02:43,630 first of all, db contacts drop index and I'll drop my gender ascending index here and let's create 32 00:02:43,660 --> 00:02:44,590 a new one here 33 00:02:44,860 --> 00:02:50,200 and this index will not just have the gender but actually as a first key, let's say I want to look for 34 00:02:50,210 --> 00:02:53,880 dob.age and that sorts this in ascending order too. 35 00:02:53,930 --> 00:02:58,810 Now the order of these two fields here does matter because a compound index simply is an index with 36 00:02:58,810 --> 00:03:00,420 more than one field, like here 37 00:03:00,460 --> 00:03:06,930 we got two fields and this will essentially store one index where each entry in the index is now not on 38 00:03:06,940 --> 00:03:12,580 a single value but two combined values. So it does not create two indexes, that's important, 39 00:03:12,610 --> 00:03:19,540 it creates one index but one index where every element is a well, connected value. So now it will simply 40 00:03:19,540 --> 00:03:29,080 create pairs of ages and genders, so we'll have 30 male, 30 male, 30 female, 31 male, 31 female and so on 41 00:03:29,200 --> 00:03:36,230 in the index list and the order here defines which kind of pairs mongodb creates, 42 00:03:36,240 --> 00:03:49,440 does it create let's say 31 male field or a male 31 or 33 field and this will be important for your 43 00:03:49,440 --> 00:03:50,780 queries as you will see. 44 00:03:50,820 --> 00:03:57,900 So let's create that index, created the index successfully and now we can utilize such a compound 45 00:03:57,900 --> 00:04:01,040 index with two different queries. 46 00:04:01,080 --> 00:04:05,550 The obvious query that takes advantage of the index is that 47 00:04:05,550 --> 00:04:11,130 if I explain without execution steps here, I find 48 00:04:12,330 --> 00:04:13,470 query where 49 00:04:13,560 --> 00:04:24,900 I try to find all people where, whoops, the age is let's say 35 and all the people are males, 50 00:04:25,020 --> 00:04:29,610 so I can also set gender to male. 51 00:04:29,610 --> 00:04:32,450 So I got two elements in my filter here, 52 00:04:32,550 --> 00:04:36,630 the age and the gender and these are the two elements I also have in my index, here 53 00:04:36,660 --> 00:04:37,760 the order does not matter 54 00:04:37,760 --> 00:04:45,450 by the way. If I hit enter, you see it actually did an index scan with our index name and this name is 55 00:04:45,450 --> 00:04:46,350 auto-generated, 56 00:04:46,430 --> 00:04:51,870 it indicates that this is a compound index because the name is generated from the values that are placed 57 00:04:51,930 --> 00:04:53,270 in the index. 58 00:04:53,280 --> 00:05:01,470 So this is a compound index and this speeds up our queries for requests that are well, going to these 59 00:05:01,470 --> 00:05:08,100 two fields because we have these two in an index. Another query that is sped up by that same index 60 00:05:08,350 --> 00:05:10,690 is if I just look for the age, 61 00:05:10,920 --> 00:05:12,880 if I do that, you'll see here 62 00:05:13,020 --> 00:05:19,590 it also used an index scan and it used that same compound index even though I never specified the gender 63 00:05:19,590 --> 00:05:20,860 in this request here. 64 00:05:21,180 --> 00:05:26,310 But the compound index can be used from left to right so to say, 65 00:05:26,460 --> 00:05:31,710 so if you have age and gender in there and the leftmost value as you created the index was age and then 66 00:05:31,710 --> 00:05:39,670 the second value was gender, then you can use this index either for all finds that look just for the age, 67 00:05:39,690 --> 00:05:45,810 so for the left part or left and gender. Gender alone won't work though, 68 00:05:45,810 --> 00:05:54,980 so if I try to look for all males with gender male, you'll see that of course works but it uses a collection 69 00:05:54,980 --> 00:06:00,110 scan and not an index scan because it can't look into that second value standalone 70 00:06:00,110 --> 00:06:06,200 and that makes a lot of sense because if you got a couple of pairs of values in that index, 33 male, 71 00:06:06,260 --> 00:06:13,640 33 male, 33 female and so on, then you can obviously use that index if you just look for 33 72 00:06:13,730 --> 00:06:19,100 because you just have to go through all the values because they are sorted by these key value pairs and the 73 00:06:19,100 --> 00:06:22,660 first value of course defines the sorting for the overall index, 74 00:06:22,670 --> 00:06:30,080 so 33, 33 comes before 34. The male and female parts are then also sorted but only within 75 00:06:30,080 --> 00:06:30,990 their category, 76 00:06:31,010 --> 00:06:34,510 so only for 33, 34, 35, 77 00:06:34,520 --> 00:06:38,480 of course the male and female parts are not sorted on the overall list, 78 00:06:38,480 --> 00:06:41,960 makes sense right, because they're grouped together with the age numbers. 79 00:06:42,170 --> 00:06:46,030 So therefore, you can utilize a compound index from left to right, 80 00:06:46,040 --> 00:06:52,070 so if you had a compound index with three or four elements, you can have up to 31, then you could also 81 00:06:52,070 --> 00:06:53,200 just use the first two, 82 00:06:53,240 --> 00:06:56,060 the first three or all four or just the first 83 00:06:56,120 --> 00:06:59,590 but you can't use the fourth one or the third one. 84 00:06:59,600 --> 00:07:05,030 These are some restrictions you have on compound indexes but compound indexes in general allow you 85 00:07:05,030 --> 00:07:08,400 to speed up queries that use multiple values 86 00:07:08,510 --> 00:07:11,780 if you create a compound index on these multiple values.