1
00:00:02,130 --> 00:00:04,240
Now that we had a look at covered queries,

2
00:00:04,320 --> 00:00:08,980
let's also see when mongodb actually rejects a plan.

3
00:00:09,000 --> 00:00:16,670
Now for that, I'll reuse my customers collection from the last video and remember there we have two indexes, the

4
00:00:16,760 --> 00:00:19,760
standard ID index and our own name index.

5
00:00:19,760 --> 00:00:27,050
Now I'll also add additional compound index on customers with create index and I will add age here

6
00:00:27,140 --> 00:00:29,060
in ascending order and name,

7
00:00:29,060 --> 00:00:34,910
now please note that the order here is important for compound indexes and if I would have put name

8
00:00:34,910 --> 00:00:41,250
first, this index here wouldn't make much sense because then I would have a single field index or

9
00:00:41,250 --> 00:00:42,080
name

10
00:00:42,090 --> 00:00:47,300
even though the name is the first field in a compound index and you learned from left to right, a compound

11
00:00:47,320 --> 00:00:52,490
index can be used well standalone, so each field can be used standalone from left to right.

12
00:00:52,610 --> 00:00:58,370
So in this case, if age comes first, we can also filter just for age and take advantage of this index, for

13
00:00:58,370 --> 00:00:59,060
name,

14
00:00:59,090 --> 00:01:04,580
we can take advantage of this index because name is only the second value is mapped to the respective

15
00:01:04,610 --> 00:01:07,680
ages and only sorted within the age category,

16
00:01:07,700 --> 00:01:13,750
so if you filtered for just name and you didn't have that index, name could not be supported by index.

17
00:01:13,760 --> 00:01:18,770
Here it can because we have a single field index and we also get this compound index for the combination

18
00:01:18,770 --> 00:01:20,850
of age and name or just age.

19
00:01:21,140 --> 00:01:23,460
So now I create this compound index here

20
00:01:23,750 --> 00:01:26,570
and now I have three indexes of course.

21
00:01:26,570 --> 00:01:29,860
Now let's search or let's explain,

22
00:01:30,000 --> 00:01:41,710
so customers explain, let's search for all persons with a name and an age, so name maybe Max and age 30.

23
00:01:42,210 --> 00:01:50,020
If I hit enter, you'll see the winning plan is index scan where I use the compound index

24
00:01:50,120 --> 00:01:56,930
and as I mentioned in the compound index lecture, the order in which you specify your arguments when writing

25
00:01:56,930 --> 00:02:02,840
the query does not matter, our compound index was created with age as the first value and name as a second

26
00:02:03,020 --> 00:02:06,150
and then the query I have name as the first value and age is the second

27
00:02:06,260 --> 00:02:11,480
but here in the query, the order does not matter, it still is able to use that compound index because

28
00:02:11,480 --> 00:02:13,280
we have a simple and condition here

29
00:02:13,370 --> 00:02:18,480
and you can of course reverse that and mongodb does reverse it for us, automatically so to say.

30
00:02:18,680 --> 00:02:23,000
Therefore the winning plan is an index scan using that compound index.

31
00:02:23,000 --> 00:02:29,660
Now what we can see is that we now also have a rejected plan and the rejected plan was to use an

32
00:02:29,660 --> 00:02:35,450
index scan just on the name index, that is the single field name index I created.

33
00:02:35,450 --> 00:02:41,430
Of course it made sense for mongodb to also consider this because we have that name index,

34
00:02:41,510 --> 00:02:44,880
we had a query that included a search for name,

35
00:02:44,930 --> 00:02:48,040
so it made sense to consider that name index

36
00:02:48,140 --> 00:02:52,970
but of course since we have another index which simply fits this query better, this compound index,

37
00:02:53,270 --> 00:02:58,750
mongodb chose to use this one as a winning plan and rejected this one.

38
00:02:58,760 --> 00:03:00,490
Now this is interesting to know,

39
00:03:00,560 --> 00:03:08,450
the question of course is how exactly does mongodb figure out which plan is better? For this, mongodb

40
00:03:08,450 --> 00:03:10,820
uses an approach where it simply

41
00:03:10,820 --> 00:03:15,790
first of all looks for indexes that could help you with the query at hand,

42
00:03:15,830 --> 00:03:23,120
so in our last lecture and example, we had the name single field index and age and name as a compound index.

43
00:03:23,180 --> 00:03:29,540
Since our query, our find method included a look for name, mongodb automatically derived that

44
00:03:29,540 --> 00:03:35,370
both the name single field index and the age name compound index could be helpful.

45
00:03:35,390 --> 00:03:40,730
So it came up with two approaches, for other scenarios you might of course have more approaches,

46
00:03:40,760 --> 00:03:46,490
so let's say we have three approaches, mongodb then simply let's those approaches race against each

47
00:03:46,490 --> 00:03:49,820
other but not for the full dataset

48
00:03:49,820 --> 00:03:55,040
but it sets a certain winning condition, right now that should be one hundred documents.

49
00:03:55,040 --> 00:04:02,210
So it looks who's the first to find 100 documents and of course one of the approaches will be the fastest

50
00:04:02,390 --> 00:04:09,030
to reach that threshold, mongodb will then basically use that approach for the real query

51
00:04:09,380 --> 00:04:14,060
and of course that would be cumbersome if it would have to do this for every find method, for every

52
00:04:14,060 --> 00:04:19,030
query you send to the database because that obviously costs a little bit of performance.

53
00:04:19,900 --> 00:04:22,130
Therefore mongodb caches 

54
00:04:22,180 --> 00:04:24,420
this winning plan for this exact query,

55
00:04:24,460 --> 00:04:29,440
so for exactly the query you send with the fields you were looking for and the values for the fields

56
00:04:29,440 --> 00:04:30,520
you were looking for,

57
00:04:30,520 --> 00:04:37,480
so it caches this plan as the winner plan for this kind of query. And for future queries that are looking

58
00:04:37,480 --> 00:04:43,450
exactly equal, it uses this winning plan, for future queries that look different, that use different values

59
00:04:43,450 --> 00:04:44,650
or different keys,

60
00:04:44,650 --> 00:04:49,120
it will of course race again and find a winning plan for that type of query.

61
00:04:49,120 --> 00:04:51,860
Now of course this cache is not there forever,

62
00:04:52,210 --> 00:04:56,830
it is cleared after a certain amount of inserts or a db restart.

63
00:04:57,130 --> 00:05:00,130
To be precise, instead of being stored forever,

64
00:05:00,340 --> 00:05:07,360
this winning plan is removed from cache after you wrote a certain amount of documents to that collection,

65
00:05:07,680 --> 00:05:09,480
currently there should be one thousand,

66
00:05:09,490 --> 00:05:15,190
so after you added that many documents, mongodb says I don't know if the current winning plan will still

67
00:05:15,190 --> 00:05:17,410
win because the collection changed a lot,

68
00:05:17,410 --> 00:05:19,360
so I should reconsider.

69
00:05:19,450 --> 00:05:25,840
The same happens if you rebuild the index, so if you drop it and recreate it for example,

70
00:05:25,840 --> 00:05:31,640
it also gets deleted if you add other indexes because that new index could be better,

71
00:05:31,660 --> 00:05:39,730
so the cache for indexes or winning plans I should say gets cleared when you add new indexes or when

72
00:05:39,730 --> 00:05:41,620
you restart the mongodb server,

73
00:05:41,710 --> 00:05:44,530
then this also gets reset.

74
00:05:44,530 --> 00:05:46,260
So this is how mongodb derives

75
00:05:46,270 --> 00:05:48,240
the winning plan and how it stores it

76
00:05:48,390 --> 00:05:55,660
and it's interesting for you as a developer to regularly check your queries, your find or update or delete

77
00:05:55,660 --> 00:05:59,390
queries and see what mongodb actually does,

78
00:05:59,560 --> 00:06:05,710
if it uses indexes efficiently, if maybe a new index should be added, something you can do on your own

79
00:06:05,710 --> 00:06:10,160
if you own the database or you pass that information to your db administrator

80
00:06:10,420 --> 00:06:15,940
or if you maybe need to adjust the query. Maybe you're always fetching data that you don't really need

81
00:06:16,240 --> 00:06:20,070
and you could get a covered query if you just would project the data you need

82
00:06:20,110 --> 00:06:22,800
which happens to be the data stored in the index.

83
00:06:22,810 --> 00:06:27,610
That is why you as a developer need to know how indexes work because either you need to create them

84
00:06:27,610 --> 00:06:33,970
on your own in your next project on which you work alone or because you can optimize your queries or

85
00:06:34,060 --> 00:06:37,510
tell your administrator to optimize the indexes.

86
00:06:37,510 --> 00:06:44,020
Now with that information gained, let's have a look at the last level of verbosity the explain method offers

87
00:06:44,500 --> 00:06:50,500
and that is that we can add all plans execution as a value here.

88
00:06:50,500 --> 00:06:56,140
Now what this will do is it will give us a bunch of output because here, we get detailed statistics for all

89
00:06:56,140 --> 00:06:57,020
plans,

90
00:06:57,070 --> 00:07:04,330
also the rejected plans. So there we can see in detail how would an index scan on our compound index

91
00:07:04,330 --> 00:07:10,330
perform but then also what would happen if we had that index scan on the name index,

92
00:07:10,330 --> 00:07:11,700
how long would that take,

93
00:07:11,710 --> 00:07:16,630
how many documents would it there consider documents and how many keys would it consider,

94
00:07:16,630 --> 00:07:17,770
how long would it take,

95
00:07:17,770 --> 00:07:22,630
here we have zero because it's so super fast but for larger collections, you would of course see a difference.

96
00:07:22,720 --> 00:07:29,740
And with that, you can get detailed analytics on the different indexes and queries and possible ways

97
00:07:29,740 --> 00:07:34,540
of running your query and therefore you should have all the tools you need to optimize your queries

98
00:07:34,810 --> 00:07:35,620
and your indexes.