1 00:00:02,500 --> 00:00:07,450 So let's now have a look at this index thing in practice and understand the impact they have and how we 2 00:00:07,450 --> 00:00:11,200 can create indexes. For this, you should have a running mongodb server, 3 00:00:11,200 --> 00:00:17,050 I have them in a second tab and then you find a starting dataset attached to this video, 4 00:00:17,080 --> 00:00:23,080 the persons.json file. Download that file and store it somewhere on your machine and then 5 00:00:23,080 --> 00:00:28,960 navigate in the terminal or command prompt into that folder where you have it stored so that you can 6 00:00:28,960 --> 00:00:31,860 easily import it with mongo import. 7 00:00:32,050 --> 00:00:33,690 So simply type mongo import, 8 00:00:33,760 --> 00:00:35,310 then the path to that file 9 00:00:35,330 --> 00:00:38,260 and now since I'm running this in the folder where the file is, 10 00:00:38,260 --> 00:00:43,900 I can just type the file name. Then the database where you want to store this and I'll simply create 11 00:00:43,900 --> 00:00:49,450 a new one, contact data and the collection which you want to create for that data and I'll name mine 12 00:00:49,450 --> 00:00:50,620 contacts. 13 00:00:50,620 --> 00:00:56,490 Now you also need to add json array at the end so that this gets imported correctly and it should import 14 00:00:56,500 --> 00:00:58,540 5000 documents. 15 00:00:58,570 --> 00:01:03,820 Now as you connect to your database thereafter, you should have that new contact data database which 16 00:01:03,820 --> 00:01:05,030 we can use now, 17 00:01:05,050 --> 00:01:10,520 so contact data and in there, you should have this contacts collection. 18 00:01:10,960 --> 00:01:12,950 So let's now use that contacts collection and 19 00:01:13,010 --> 00:01:18,250 let's first of all have a look at a single contact, for this I'll reach out to my contacts and find one 20 00:01:18,250 --> 00:01:18,990 element. 21 00:01:19,330 --> 00:01:21,180 So this is essentially what we have in there, 22 00:01:21,190 --> 00:01:23,530 this is some random person data, 23 00:01:23,530 --> 00:01:29,800 each person has an ID, gender, a name which is actually a nested document as you can see, a location which 24 00:01:29,800 --> 00:01:33,630 is a nested document, even coordinates here, 25 00:01:33,790 --> 00:01:43,030 we got a time zone, got an e-mail, login data, date of birth which is both the date and the current age, some 26 00:01:43,030 --> 00:01:43,930 registration date, 27 00:01:43,930 --> 00:01:50,790 let's say this is for a web platform where people can manage their contacts and each contact therefore also 28 00:01:50,830 --> 00:01:54,700 has a date when the contact signed up and so on, 29 00:01:54,700 --> 00:01:58,570 so we get this basic dummy person data which we can use. 30 00:01:58,570 --> 00:02:03,940 So now let's run a query and let's find all people who are older than 60, 31 00:02:04,240 --> 00:02:11,650 so for this, I will clear my shell here and I will then reach out to contacts, run a find method here 32 00:02:12,250 --> 00:02:15,280 and I'm looking for the dob.age field, 33 00:02:15,280 --> 00:02:22,310 so for this field in the embedded document and I'll have a greater than query and the greater than query 34 00:02:22,310 --> 00:02:32,410 will be looking for people older than 60. Let's also pretty print this and we get a bunch of results, 61 35 00:02:32,430 --> 00:02:35,300 is the age here, 61 again, 36 00:02:35,310 --> 00:02:39,240 so there are some people who are older than 60 as it seems. 37 00:02:39,240 --> 00:02:43,230 You can also have a look at how many results we got by adding count at the end, 38 00:02:43,240 --> 00:02:46,170 so 1222. 39 00:02:46,200 --> 00:02:51,420 Now of course this was a super fast query but we also don't have that many documents in this collection, 40 00:02:51,510 --> 00:02:53,240 that's important to keep in mind, 41 00:02:53,640 --> 00:03:01,620 now in order to determine whether an index can help us or to see what mongodb actually does, 42 00:03:01,620 --> 00:03:06,990 mongodb gives us a nice tool that we can use to analyze how it executed the query 43 00:03:07,260 --> 00:03:13,530 and this tool is a simple method we add to our query. Here after you reach out to the collection, 44 00:03:13,560 --> 00:03:17,400 you can add the explain method and then chain your normal query, 45 00:03:17,400 --> 00:03:19,830 explain works for find, update, delete 46 00:03:19,890 --> 00:03:20,940 not for insert, 47 00:03:20,940 --> 00:03:26,140 so it basically works for the methods where you well narrow down documents, where you find documents 48 00:03:26,790 --> 00:03:37,920 and then here we can of course repeat our condition, looking for dob.age to be greater than 60. 49 00:03:37,920 --> 00:03:45,310 Now here we get the detailed description of what mongodb did and how it derived our results, 50 00:03:45,540 --> 00:03:51,620 mongodb thinks in so-called plans and plans are simply alternatives it considers for executing that query 51 00:03:52,280 --> 00:03:54,260 and in the end it will find a winning plan 52 00:03:54,300 --> 00:03:56,950 and I'll come back to how mongodb determines this later 53 00:03:57,210 --> 00:04:00,720 and that winning plan is essentially what it did to get our results 54 00:04:00,810 --> 00:04:04,060 and you see here, the winning plan was to do a full collection scan. 55 00:04:04,470 --> 00:04:09,330 We could also have rejected plans but for this, we would need alternatives and without indexes, a full 56 00:04:09,330 --> 00:04:11,760 scan is always the only thing mongodb can do, 57 00:04:11,760 --> 00:04:17,490 so there were no alternatives and therefore the only approach we had of course is the winning plan. 58 00:04:17,490 --> 00:04:22,010 Now we can get even more detailed output by re-running that command 59 00:04:22,050 --> 00:04:27,690 but now we can pass an argument to explain and that argument is a string where we control the verbosity 60 00:04:27,690 --> 00:04:28,810 of this command. 61 00:04:28,950 --> 00:04:35,010 If you pass execution stats here, make sure you get that typed correctly with a capital S and a lower 62 00:04:35,010 --> 00:04:35,840 case e, 63 00:04:36,180 --> 00:04:41,780 you find a detailed output for this query and how the results were returned. 64 00:04:41,860 --> 00:04:47,710 There you'll see that the overall query took 5 milliseconds which is of course super fast but our collections 65 00:04:47,850 --> 00:04:48,770 is also not very big 66 00:04:48,880 --> 00:04:54,270 and if it were bigger, if it had millions of documents, this number would of course scale up and you see 67 00:04:54,340 --> 00:05:00,130 that we had to look at 5000 documents in order to return our 1222, 68 00:05:00,160 --> 00:05:06,500 so there's quite a big gap here and this already is a sign that this is a kind of an inefficient query. 69 00:05:06,520 --> 00:05:09,760 Now let's add an index and see how this changes things, 70 00:05:09,760 --> 00:05:16,180 we do add an index to a collection by typing db contacts and then create index. 71 00:05:16,420 --> 00:05:19,240 Now an index is defined as a document here 72 00:05:19,360 --> 00:05:20,720 and the first value, 73 00:05:20,740 --> 00:05:24,730 the key here is the name of the field you want to create an index on 74 00:05:24,940 --> 00:05:26,390 and in my case that is dob.age, 75 00:05:26,410 --> 00:05:26,950 . 76 00:05:27,100 --> 00:05:32,430 So you see you can create indexes on embedded fields just as you could use a normal field, 77 00:05:32,500 --> 00:05:39,280 so you can use top level fields, you can use embedded fields, doesn't matter. Then the value is whether 78 00:05:39,360 --> 00:05:46,210 mongodb should create that list of values in that age field in an ascending or descending order, 79 00:05:46,240 --> 00:05:49,310 so it can sort by assigning or descending order. 80 00:05:49,480 --> 00:05:52,120 If you add a one here, it'll be ascending order, 81 00:05:52,210 --> 00:05:53,770 so lower values come first, 82 00:05:53,770 --> 00:05:55,470 higher values towards the end, 83 00:05:55,660 --> 00:06:01,150 if you add a -1 here, it's descending. What you choose here in the end doesn't matter too much 84 00:06:01,240 --> 00:06:06,850 even if you do sort your results and you sort to the opposite direction, it will still be sped up because 85 00:06:06,850 --> 00:06:10,000 mongodb can traverse that index in both directions, 86 00:06:10,000 --> 00:06:14,950 so you can actually choose what you want here and I'll go for ascending. 87 00:06:14,950 --> 00:06:18,500 Now this created an index, you see we had one before, 88 00:06:18,580 --> 00:06:20,890 we'll see which index that was in a second, 89 00:06:20,890 --> 00:06:22,520 now we have two. 90 00:06:22,660 --> 00:06:26,410 So with that, let's repeat our explain command here 91 00:06:26,410 --> 00:06:34,120 for people older than 60 and if we do that, we should see that now now the execution time is down significantly 92 00:06:34,330 --> 00:06:35,640 from 5 to 3, 93 00:06:35,770 --> 00:06:38,050 so obviously we're talking about small numbers here 94 00:06:38,170 --> 00:06:41,360 but the main thing is it was sped up. 95 00:06:41,380 --> 00:06:44,980 We also see that there are two execution stages now 96 00:06:45,340 --> 00:06:51,700 and we also see that the first stage, the input stage was an index scan, in the other output, that would 97 00:06:51,700 --> 00:06:53,800 be our winning plan essentially. 98 00:06:53,800 --> 00:06:58,620 So it did not do a full collection scan but instead an index scan 99 00:06:58,660 --> 00:07:05,260 and there you see that it returned 1222 documents or not documents to be precise 100 00:07:05,320 --> 00:07:09,850 but keys in the index with their respective pointers at documents, 101 00:07:10,000 --> 00:07:15,380 so the index scan does not return the documents but just the keys in the index and the pointers to the 102 00:07:15,370 --> 00:07:16,270 documents. 103 00:07:16,420 --> 00:07:17,530 It's the next stage, 104 00:07:17,590 --> 00:07:22,720 the fetch stage which will then take these pointers returned from the index and reach out to the actual 105 00:07:22,720 --> 00:07:26,060 collection and then fetch the real documents from there 106 00:07:26,170 --> 00:07:37,210 and therefore in the end, we see here we had to only look at 1222 keys in our index to reach 1222 107 00:07:37,210 --> 00:07:39,500 documents which are returned. 108 00:07:39,700 --> 00:07:44,670 We also had to look at these documents because the index only has the pointer set to documents, 109 00:07:44,680 --> 00:07:46,800 so the index just narrows down the set, 110 00:07:46,810 --> 00:07:51,340 we still have to go through the collection and get the documents from there to return them in the end 111 00:07:51,430 --> 00:07:55,590 but this sped up our query and this is how an index can help us. 112 00:07:55,630 --> 00:08:00,460 Now before we dive deeper into different types of indexes, let me also show you something interesting 113 00:08:00,460 --> 00:08:04,200 about this dataset which helps you understand indexes a bit better.