1 00:00:02,080 --> 00:00:07,840 So we had a look at our first index and this index actually sped up our query, it sped up our query 2 00:00:08,230 --> 00:00:10,740 for people older than 60, 3 00:00:10,810 --> 00:00:17,320 now something interesting happens if we do run the same query for people older than 20. 4 00:00:17,540 --> 00:00:23,510 If I do run this query, it's still super fast but what we see is that we essentially seem to have no 5 00:00:23,510 --> 00:00:26,730 person in our dataset that is younger than 20, 6 00:00:26,810 --> 00:00:32,000 we'll have to go through the entire index, every single element matches our query and then we fetch documents 7 00:00:32,000 --> 00:00:32,590 for that 8 00:00:32,780 --> 00:00:38,270 and you see that the execution time for this is higher than it was for people older than 60. 9 00:00:38,270 --> 00:00:44,210 Now something interesting happens if we get rid of that index which we can do by reaching out to our 10 00:00:44,210 --> 00:00:50,390 database, to our collection and then running drop index and now here we specify the exact same document we 11 00:00:50,390 --> 00:00:51,580 used for creating it, 12 00:00:51,740 --> 00:00:57,180 so dob.age and then one and now you see that index was removed 13 00:00:57,290 --> 00:01:01,920 and now let's run the same query again or explain it, 14 00:01:01,920 --> 00:01:08,590 now that we have no index. What you'll see is that this now actually is faster, we get execution time of 6 milli 15 00:01:08,660 --> 00:01:10,080 seconds. 16 00:01:10,080 --> 00:01:15,140 Now the reason why this is faster is that we save the step of going for the index, 17 00:01:15,260 --> 00:01:22,490 if you have a query that will return a large portion or the majority of your documents, an index can 18 00:01:22,490 --> 00:01:28,910 actually be slower because you then just have an extra step to go through your almost entire 19 00:01:28,910 --> 00:01:34,370 index list and then you have to go to the collection and get all these documents, 20 00:01:34,430 --> 00:01:39,750 so you then just have an extra step because if you do a full collection scan, it can be faster 21 00:01:39,800 --> 00:01:44,990 and it certainly is if you return all elements but even for the majority it would be faster because 22 00:01:44,990 --> 00:01:51,950 with a full collection scan, you already have all the documents in memory and then an index doesn't offer you 23 00:01:51,950 --> 00:01:54,160 any more because that just is an extra step. 24 00:01:54,170 --> 00:01:56,340 Instead here we got all the documents in memory, 25 00:01:56,450 --> 00:02:01,130 we would have needed to go to the documents anyways to fetch them from the pointers the index gives 26 00:02:01,130 --> 00:02:01,640 us, 27 00:02:01,640 --> 00:02:05,900 so now we already have them and since we need most of them, this is now faster. 28 00:02:05,900 --> 00:02:11,960 So if you have queries that regularly return the majority or all of your documents, an index will not 29 00:02:11,960 --> 00:02:13,060 really help you there, 30 00:02:13,070 --> 00:02:15,540 it might even slow down the execution 31 00:02:15,740 --> 00:02:20,150 and that is important to keep in mind as a first restriction that you need to know when planning your 32 00:02:20,150 --> 00:02:20,850 queries. 33 00:02:21,020 --> 00:02:27,140 If you have a dataset where your queries typically only return fractions, like 10 or 20 percent or lower 34 00:02:27,140 --> 00:02:32,860 than that of the documents, then indexes will almost certainly always speed it up. 35 00:02:32,990 --> 00:02:37,200 If you've got a lot of queries that give you back all the documents or close to all the documents, 36 00:02:37,290 --> 00:02:42,740 indexes can't do that much work for you and logically, that makes sense because the idea of index is to 37 00:02:42,740 --> 00:02:48,710 quickly let you get to a narrow subset of your document list and not to the majority of that.