1
00:00:02,500 --> 00:00:07,450
So let's now have a look at this index thing in practice and understand the impact they have and how we

2
00:00:07,450 --> 00:00:11,200
can create indexes. For this, you should have a running mongodb server,

3
00:00:11,200 --> 00:00:17,050
I have them in a second tab and then you find a starting dataset attached to this video,

4
00:00:17,080 --> 00:00:23,080
the persons.json file. Download that file and store it somewhere on your machine and then

5
00:00:23,080 --> 00:00:28,960
navigate in the terminal or command prompt into that folder where you have it stored so that you can

6
00:00:28,960 --> 00:00:31,860
easily import it with mongo import.

7
00:00:32,050 --> 00:00:33,690
So simply type mongo import,

8
00:00:33,760 --> 00:00:35,310
then the path to that file

9
00:00:35,330 --> 00:00:38,260
and now since I'm running this in the folder where the file is,

10
00:00:38,260 --> 00:00:43,900
I can just type the file name. Then the database where you want to store this and I'll simply create

11
00:00:43,900 --> 00:00:49,450
a new one, contact data and the collection which you want to create for that data and I'll name mine

12
00:00:49,450 --> 00:00:50,620
contacts.

13
00:00:50,620 --> 00:00:56,490
Now you also need to add json array at the end so that this gets imported correctly and it should import

14
00:00:56,500 --> 00:00:58,540
5000 documents.

15
00:00:58,570 --> 00:01:03,820
Now as you connect to your database thereafter, you should have that new contact data database which

16
00:01:03,820 --> 00:01:05,030
we can use now,

17
00:01:05,050 --> 00:01:10,520
so contact data and in there, you should have this contacts collection.

18
00:01:10,960 --> 00:01:12,950
So let's now use that contacts collection and

19
00:01:13,010 --> 00:01:18,250
let's first of all have a look at a single contact, for this I'll reach out to my contacts and find one

20
00:01:18,250 --> 00:01:18,990
element.

21
00:01:19,330 --> 00:01:21,180
So this is essentially what we have in there,

22
00:01:21,190 --> 00:01:23,530
this is some random person data,

23
00:01:23,530 --> 00:01:29,800
each person has an ID, gender, a name which is actually a nested document as you can see, a location which

24
00:01:29,800 --> 00:01:33,630
is a nested document, even coordinates here,

25
00:01:33,790 --> 00:01:43,030
we got a time zone, got an e-mail, login data, date of birth which is both the date and the current age, some

26
00:01:43,030 --> 00:01:43,930
registration date,

27
00:01:43,930 --> 00:01:50,790
let's say this is for a web platform where people can manage their contacts and each contact therefore also

28
00:01:50,830 --> 00:01:54,700
has a date when the contact signed up and so on,

29
00:01:54,700 --> 00:01:58,570
so we get this basic dummy person data which we can use.

30
00:01:58,570 --> 00:02:03,940
So now let's run a query and let's find all people who are older than 60,

31
00:02:04,240 --> 00:02:11,650
so for this, I will clear my shell here and I will then reach out to contacts, run a find method here

32
00:02:12,250 --> 00:02:15,280
and I'm looking for the dob.age field,

33
00:02:15,280 --> 00:02:22,310
so for this field in the embedded document and I'll have a greater than query and the greater than query

34
00:02:22,310 --> 00:02:32,410
will be looking for people older than 60. Let's also pretty print this and we get a bunch of results, 61

35
00:02:32,430 --> 00:02:35,300
is the age here, 61 again, 

36
00:02:35,310 --> 00:02:39,240
so there are some people who are older than 60 as it seems.

37
00:02:39,240 --> 00:02:43,230
You can also have a look at how many results we got by adding count at the end,

38
00:02:43,240 --> 00:02:46,170
so 1222.

39
00:02:46,200 --> 00:02:51,420
Now of course this was a super fast query but we also don't have that many documents in this collection,

40
00:02:51,510 --> 00:02:53,240
that's important to keep in mind,

41
00:02:53,640 --> 00:03:01,620
now in order to determine whether an index can help us or to see what mongodb actually does,

42
00:03:01,620 --> 00:03:06,990
mongodb gives us a nice tool that we can use to analyze how it executed the query

43
00:03:07,260 --> 00:03:13,530
and this tool is a simple method we add to our query. Here after you reach out to the collection,

44
00:03:13,560 --> 00:03:17,400
you can add the explain method and then chain your normal query,

45
00:03:17,400 --> 00:03:19,830
explain works for find, update, delete

46
00:03:19,890 --> 00:03:20,940
not for insert,

47
00:03:20,940 --> 00:03:26,140
so it basically works for the methods where you well narrow down documents, where you find documents

48
00:03:26,790 --> 00:03:37,920
and then here we can of course repeat our condition, looking for dob.age to be greater than 60.

49
00:03:37,920 --> 00:03:45,310
Now here we get the detailed description of what mongodb did and how it derived our results,

50
00:03:45,540 --> 00:03:51,620
mongodb thinks in so-called plans and plans are simply alternatives it considers for executing that query

51
00:03:52,280 --> 00:03:54,260
and in the end it will find a winning plan

52
00:03:54,300 --> 00:03:56,950
and I'll come back to how mongodb determines this later

53
00:03:57,210 --> 00:04:00,720
and that winning plan is essentially what it did to get our results

54
00:04:00,810 --> 00:04:04,060
and you see here, the winning plan was to do a full collection scan.

55
00:04:04,470 --> 00:04:09,330
We could also have rejected plans but for this, we would need alternatives and without indexes, a full

56
00:04:09,330 --> 00:04:11,760
scan is always the only thing mongodb can do,

57
00:04:11,760 --> 00:04:17,490
so there were no alternatives and therefore the only approach we had of course is the winning plan.

58
00:04:17,490 --> 00:04:22,010
Now we can get even more detailed output by re-running that command

59
00:04:22,050 --> 00:04:27,690
but now we can pass an argument to explain and that argument is a string where we control the verbosity

60
00:04:27,690 --> 00:04:28,810
of this command.

61
00:04:28,950 --> 00:04:35,010
If you pass execution stats here, make sure you get that typed correctly with a capital S and a lower

62
00:04:35,010 --> 00:04:35,840
case e,

63
00:04:36,180 --> 00:04:41,780
you find a detailed output for this query and how the results were returned.

64
00:04:41,860 --> 00:04:47,710
There you'll see that the overall query took 5 milliseconds which is of course super fast but our collections

65
00:04:47,850 --> 00:04:48,770
is also not very big

66
00:04:48,880 --> 00:04:54,270
and if it were bigger, if it had millions of documents, this number would of course scale up and you see

67
00:04:54,340 --> 00:05:00,130
that we had to look at 5000 documents in order to return our 1222,

68
00:05:00,160 --> 00:05:06,500
so there's quite a big gap here and this already is a sign that this is a kind of an inefficient query.

69
00:05:06,520 --> 00:05:09,760
Now let's add an index and see how this changes things,

70
00:05:09,760 --> 00:05:16,180
we do add an index to a collection by typing db contacts and then create index.

71
00:05:16,420 --> 00:05:19,240
Now an index is defined as a document here

72
00:05:19,360 --> 00:05:20,720
and the first value,

73
00:05:20,740 --> 00:05:24,730
the key here is the name of the field you want to create an index on

74
00:05:24,940 --> 00:05:26,390
and in my case that is dob.age,

75
00:05:26,410 --> 00:05:26,950
.

76
00:05:27,100 --> 00:05:32,430
So you see you can create indexes on embedded fields just as you could use a normal field,

77
00:05:32,500 --> 00:05:39,280
so you can use top level fields, you can use embedded fields, doesn't matter. Then the value is whether

78
00:05:39,360 --> 00:05:46,210
mongodb should create that list of values in that age field in an ascending or descending order,

79
00:05:46,240 --> 00:05:49,310
so it can sort by assigning or descending order.

80
00:05:49,480 --> 00:05:52,120
If you add a one here, it'll be ascending order,

81
00:05:52,210 --> 00:05:53,770
so lower values come first,

82
00:05:53,770 --> 00:05:55,470
higher values towards the end,

83
00:05:55,660 --> 00:06:01,150
if you add a -1 here, it's descending. What you choose here in the end doesn't matter too much

84
00:06:01,240 --> 00:06:06,850
even if you do sort your results and you sort to the opposite direction, it will still be sped up because

85
00:06:06,850 --> 00:06:10,000
mongodb can traverse that index in both directions,

86
00:06:10,000 --> 00:06:14,950
so you can actually choose what you want here and I'll go for ascending.

87
00:06:14,950 --> 00:06:18,500
Now this created an index, you see we had one before,

88
00:06:18,580 --> 00:06:20,890
we'll see which index that was in a second,

89
00:06:20,890 --> 00:06:22,520
now we have two.

90
00:06:22,660 --> 00:06:26,410
So with that, let's repeat our explain command here

91
00:06:26,410 --> 00:06:34,120
for people older than 60 and if we do that, we should see that now now the execution time is down significantly

92
00:06:34,330 --> 00:06:35,640
from 5 to 3,

93
00:06:35,770 --> 00:06:38,050
so obviously we're talking about small numbers here

94
00:06:38,170 --> 00:06:41,360
but the main thing is it was sped up.

95
00:06:41,380 --> 00:06:44,980
We also see that there are two execution stages now

96
00:06:45,340 --> 00:06:51,700
and we also see that the first stage, the input stage was an index scan, in the other output, that would

97
00:06:51,700 --> 00:06:53,800
be our winning plan essentially.

98
00:06:53,800 --> 00:06:58,620
So it did not do a full collection scan but instead an index scan

99
00:06:58,660 --> 00:07:05,260
and there you see that it returned 1222 documents or not documents to be precise

100
00:07:05,320 --> 00:07:09,850
but keys in the index with their respective pointers at documents,

101
00:07:10,000 --> 00:07:15,380
so the index scan does not return the documents but just the keys in the index and the pointers to the

102
00:07:15,370 --> 00:07:16,270
documents.

103
00:07:16,420 --> 00:07:17,530
It's the next stage,

104
00:07:17,590 --> 00:07:22,720
the fetch stage which will then take these pointers returned from the index and reach out to the actual

105
00:07:22,720 --> 00:07:26,060
collection and then fetch the real documents from there

106
00:07:26,170 --> 00:07:37,210
and therefore in the end, we see here we had to only look at 1222 keys in our index to reach 1222

107
00:07:37,210 --> 00:07:39,500
documents which are returned.

108
00:07:39,700 --> 00:07:44,670
We also had to look at these documents because the index only has the pointer set to documents,

109
00:07:44,680 --> 00:07:46,800
so the index just narrows down the set,

110
00:07:46,810 --> 00:07:51,340
we still have to go through the collection and get the documents from there to return them in the end

111
00:07:51,430 --> 00:07:55,590
but this sped up our query and this is how an index can help us.

112
00:07:55,630 --> 00:08:00,460
Now before we dive deeper into different types of indexes, let me also show you something interesting

113
00:08:00,460 --> 00:08:04,200
about this dataset which helps you understand indexes a bit better.