1
00:00:02,240 --> 00:00:06,400
We had a look at a lot of operators and a lot of pipeline stages

2
00:00:06,500 --> 00:00:13,010
and of course, I encourage you to dive into the official docs. Now I want to have a look at some stages

3
00:00:13,040 --> 00:00:16,870
which we saw before but which you have to get

4
00:00:16,880 --> 00:00:17,740
right

5
00:00:17,840 --> 00:00:21,790
and therefore let's see how we work with these stages.

6
00:00:21,920 --> 00:00:33,350
Let's say we want to find the 10 users, the 10 persons with the oldest birth date, so the lowest birth date

7
00:00:33,350 --> 00:00:37,050
so to say and thereafter we want to find the next 10,

8
00:00:37,160 --> 00:00:45,830
so like if we had pagination in place. Now for that, first of all, I will add a project phase to convert

9
00:00:45,830 --> 00:00:47,330
my date,

10
00:00:47,330 --> 00:00:53,580
I also could sort it whilst it's in string form but I still want to convert it also to practice this again.

11
00:00:54,080 --> 00:00:55,860
I'm not interested in my ID,

12
00:00:56,180 --> 00:01:08,560
I will keep my name and then I'll have my birth date field which I will convert to a date and the input

13
00:01:08,560 --> 00:01:14,880
for to date is dob date of course, referring to this field here.

14
00:01:16,120 --> 00:01:17,460
Now let's try that out,

15
00:01:17,650 --> 00:01:23,740
if I run this command, I get persons where the name and the birth date is available,

16
00:01:23,740 --> 00:01:27,320
now we can sort on this birth date.

17
00:01:27,490 --> 00:01:36,610
So let's add a new stage here, sort and let's sort by birth date as I just said,

18
00:01:36,610 --> 00:01:44,090
so we can simply say birth date, referring to our newly added field here in ascending order so that the

19
00:01:44,200 --> 00:01:47,780
lowest birth date comes first.

20
00:01:47,870 --> 00:01:55,100
If I do that, well then I have a bunch of birth dates here which look pretty low and keep in mind, we

21
00:01:55,100 --> 00:01:58,630
only see 20 results here because we get back a cursor,

22
00:01:58,640 --> 00:02:02,540
so we get pretty old persons or pretty old people there.

23
00:02:02,660 --> 00:02:10,400
Thanks to my date ordering, the very oldest person is Mrs Victoria Hale because she's born on 7 September

24
00:02:10,790 --> 00:02:18,290
and even though this person here, I'm sorry I don't know how to pronounce this, even though this person

25
00:02:18,350 --> 00:02:21,340
is equally old in terms of years,

26
00:02:21,370 --> 00:02:24,550
it's not the case in terms of days.

27
00:02:24,770 --> 00:02:28,100
So this is the oldest person in the dataset,

28
00:02:28,150 --> 00:02:30,870
now I only want to see the top ten

29
00:02:31,220 --> 00:02:36,740
and for this, we can add another stage which we saw before with the find method but haven't seen here

30
00:02:36,740 --> 00:02:39,650
yet, the limit stage.

31
00:02:39,650 --> 00:02:41,420
Now limit is pretty straightforward,

32
00:02:41,450 --> 00:02:49,960
we just define how many entries we want to see, 10 let's say. If I copy that and I print that out,

33
00:02:50,080 --> 00:02:52,210
we now see 10 records only,

34
00:02:52,270 --> 00:02:58,340
there is no type it to see more anymore because we exhausted our cursor here because we only see 10

35
00:02:58,960 --> 00:03:01,620
and therefore, this is what we get back.

36
00:03:01,690 --> 00:03:06,270
So now we have the top 10 oldest people in our dataset,

37
00:03:06,280 --> 00:03:12,740
now let me quickly change that name to make it a bit more readable,

38
00:03:12,750 --> 00:03:15,330
I will use concat here to build a name,

39
00:03:15,360 --> 00:03:21,240
I will not do the whole change with the uppercase starting characters to keep this a bit smaller but I

40
00:03:21,240 --> 00:03:23,930
will point at name first,

41
00:03:24,270 --> 00:03:27,550
add a whitespace and then use name

42
00:03:27,750 --> 00:03:30,450
last here simply because

43
00:03:30,450 --> 00:03:35,580
now if I use this stage, we have single lines for the persons.

44
00:03:35,610 --> 00:03:39,780
So now I got my top 10 here which is great of course,

45
00:03:39,780 --> 00:03:43,100
now let's say we want to see the next 10.

46
00:03:43,170 --> 00:03:50,720
We can do this with another stage and that is the skip stage which we have to add prior to limit,

47
00:03:51,060 --> 00:03:56,510
so now we skip the first ten records and show the next 10.

48
00:03:56,620 --> 00:04:05,430
If I now copy this and I paste it in, we see different names because we simply skipped to the next page you

49
00:04:05,430 --> 00:04:06,520
could say.

50
00:04:06,840 --> 00:04:12,870
Now what's really important here is the order of skip, limit and sort.

51
00:04:12,870 --> 00:04:22,650
If I had skip after limit like this and I copy that, I get back no results because what happens here

52
00:04:22,650 --> 00:04:26,150
is I do my projection, I sort

53
00:04:26,250 --> 00:04:30,060
then I fetch 10 persons and then I skip by 10 persons,

54
00:04:30,120 --> 00:04:31,180
so what's remaining,

55
00:04:31,320 --> 00:04:33,040
well zero.

56
00:04:33,210 --> 00:04:34,950
So that is why this order is important,

57
00:04:35,010 --> 00:04:36,750
the order did not matter on the

58
00:04:36,880 --> 00:04:43,400
find method, there you could chain the skip and limit methods onto your cursor as you want it. Here

59
00:04:43,530 --> 00:04:49,170
it does matter because your pipeline is processed step by step.

60
00:04:49,380 --> 00:04:51,820
The same of course is true for sorting,

61
00:04:51,930 --> 00:05:02,130
if I do sort after this, so after skiping and limiting, I get a totally different set of results because

62
00:05:02,130 --> 00:05:10,110
if I now execute this, I have people who are not that old, they're born in the 1980s, even 1990s. The reason

63
00:05:10,110 --> 00:05:16,900
for that is that I simply skip the first ten persons in my dataset as it's stored in the collection

64
00:05:17,280 --> 00:05:23,140
and then I take the next 10 and I only sort these 10 persons.

65
00:05:23,180 --> 00:05:28,130
So this is really important to understand, the same of course would be the case if you have a match in

66
00:05:28,130 --> 00:05:36,480
there. If we want to find the oldest males, let's say, we should include a match phase and we should include

67
00:05:36,530 --> 00:05:41,210
that early to limit the amount of data we have to work with in the other stages.

68
00:05:41,340 --> 00:05:47,560
So here, I can match for gender being equal to male of course

69
00:05:47,560 --> 00:05:52,550
and if I do that, everything will be fine and I get the 10

70
00:05:52,550 --> 00:05:53,600
oldest men here,

71
00:05:53,630 --> 00:05:58,730
well not right now because the order is incorrect, we just broke that right

72
00:05:58,850 --> 00:06:00,480
but we do match.

73
00:06:00,590 --> 00:06:11,130
However if I fix the order like this and I execute it then, now we do have the 10 oldest men

74
00:06:11,240 --> 00:06:18,560
and as I mentioned, the order does matter because if I do match after sorting, which I can do, I can filter

75
00:06:18,560 --> 00:06:22,920
at any point of time, I don't have to do this at the beginning but if I do match here,

76
00:06:24,920 --> 00:06:26,840
I get back no results at all

77
00:06:26,870 --> 00:06:31,040
and the reason for that is that gender is not included in my projection phase,

78
00:06:31,160 --> 00:06:33,920
so I try to match something which is not there.

79
00:06:34,250 --> 00:06:41,630
If I do add gender here in my projection phase, then I will get results

80
00:06:41,800 --> 00:06:49,480
but now I still have a sub-optimal setup because I do match only after transforming all the elements

81
00:06:49,570 --> 00:06:50,920
in my collection.

82
00:06:50,920 --> 00:06:54,030
So I will have projected all the females here too

83
00:06:54,160 --> 00:06:56,780
even though later, I'm only interested in the males,

84
00:06:56,890 --> 00:07:04,000
so I should reverse this. Now actually mongodb will do some optimizations for you,

85
00:07:04,000 --> 00:07:09,910
you'll find details in the next lecture which has an article. Mongodb does some optimizations for you

86
00:07:10,000 --> 00:07:15,850
to optimize your pipeline, so it might very well have fixed this issue here for us but you shouldn't

87
00:07:15,850 --> 00:07:21,430
rely too much on that and you should try to build correct pipelines with the correct order that optimizes

88
00:07:21,430 --> 00:07:22,600
for performance

89
00:07:22,600 --> 00:07:26,970
and of course well, builds the kind of structure you want to have.