1 00:00:02,240 --> 00:00:06,400 We had a look at a lot of operators and a lot of pipeline stages 2 00:00:06,500 --> 00:00:13,010 and of course, I encourage you to dive into the official docs. Now I want to have a look at some stages 3 00:00:13,040 --> 00:00:16,870 which we saw before but which you have to get 4 00:00:16,880 --> 00:00:17,740 right 5 00:00:17,840 --> 00:00:21,790 and therefore let's see how we work with these stages. 6 00:00:21,920 --> 00:00:33,350 Let's say we want to find the 10 users, the 10 persons with the oldest birth date, so the lowest birth date 7 00:00:33,350 --> 00:00:37,050 so to say and thereafter we want to find the next 10, 8 00:00:37,160 --> 00:00:45,830 so like if we had pagination in place. Now for that, first of all, I will add a project phase to convert 9 00:00:45,830 --> 00:00:47,330 my date, 10 00:00:47,330 --> 00:00:53,580 I also could sort it whilst it's in string form but I still want to convert it also to practice this again. 11 00:00:54,080 --> 00:00:55,860 I'm not interested in my ID, 12 00:00:56,180 --> 00:01:08,560 I will keep my name and then I'll have my birth date field which I will convert to a date and the input 13 00:01:08,560 --> 00:01:14,880 for to date is dob date of course, referring to this field here. 14 00:01:16,120 --> 00:01:17,460 Now let's try that out, 15 00:01:17,650 --> 00:01:23,740 if I run this command, I get persons where the name and the birth date is available, 16 00:01:23,740 --> 00:01:27,320 now we can sort on this birth date. 17 00:01:27,490 --> 00:01:36,610 So let's add a new stage here, sort and let's sort by birth date as I just said, 18 00:01:36,610 --> 00:01:44,090 so we can simply say birth date, referring to our newly added field here in ascending order so that the 19 00:01:44,200 --> 00:01:47,780 lowest birth date comes first. 20 00:01:47,870 --> 00:01:55,100 If I do that, well then I have a bunch of birth dates here which look pretty low and keep in mind, we 21 00:01:55,100 --> 00:01:58,630 only see 20 results here because we get back a cursor, 22 00:01:58,640 --> 00:02:02,540 so we get pretty old persons or pretty old people there. 23 00:02:02,660 --> 00:02:10,400 Thanks to my date ordering, the very oldest person is Mrs Victoria Hale because she's born on 7 September 24 00:02:10,790 --> 00:02:18,290 and even though this person here, I'm sorry I don't know how to pronounce this, even though this person 25 00:02:18,350 --> 00:02:21,340 is equally old in terms of years, 26 00:02:21,370 --> 00:02:24,550 it's not the case in terms of days. 27 00:02:24,770 --> 00:02:28,100 So this is the oldest person in the dataset, 28 00:02:28,150 --> 00:02:30,870 now I only want to see the top ten 29 00:02:31,220 --> 00:02:36,740 and for this, we can add another stage which we saw before with the find method but haven't seen here 30 00:02:36,740 --> 00:02:39,650 yet, the limit stage. 31 00:02:39,650 --> 00:02:41,420 Now limit is pretty straightforward, 32 00:02:41,450 --> 00:02:49,960 we just define how many entries we want to see, 10 let's say. If I copy that and I print that out, 33 00:02:50,080 --> 00:02:52,210 we now see 10 records only, 34 00:02:52,270 --> 00:02:58,340 there is no type it to see more anymore because we exhausted our cursor here because we only see 10 35 00:02:58,960 --> 00:03:01,620 and therefore, this is what we get back. 36 00:03:01,690 --> 00:03:06,270 So now we have the top 10 oldest people in our dataset, 37 00:03:06,280 --> 00:03:12,740 now let me quickly change that name to make it a bit more readable, 38 00:03:12,750 --> 00:03:15,330 I will use concat here to build a name, 39 00:03:15,360 --> 00:03:21,240 I will not do the whole change with the uppercase starting characters to keep this a bit smaller but I 40 00:03:21,240 --> 00:03:23,930 will point at name first, 41 00:03:24,270 --> 00:03:27,550 add a whitespace and then use name 42 00:03:27,750 --> 00:03:30,450 last here simply because 43 00:03:30,450 --> 00:03:35,580 now if I use this stage, we have single lines for the persons. 44 00:03:35,610 --> 00:03:39,780 So now I got my top 10 here which is great of course, 45 00:03:39,780 --> 00:03:43,100 now let's say we want to see the next 10. 46 00:03:43,170 --> 00:03:50,720 We can do this with another stage and that is the skip stage which we have to add prior to limit, 47 00:03:51,060 --> 00:03:56,510 so now we skip the first ten records and show the next 10. 48 00:03:56,620 --> 00:04:05,430 If I now copy this and I paste it in, we see different names because we simply skipped to the next page you 49 00:04:05,430 --> 00:04:06,520 could say. 50 00:04:06,840 --> 00:04:12,870 Now what's really important here is the order of skip, limit and sort. 51 00:04:12,870 --> 00:04:22,650 If I had skip after limit like this and I copy that, I get back no results because what happens here 52 00:04:22,650 --> 00:04:26,150 is I do my projection, I sort 53 00:04:26,250 --> 00:04:30,060 then I fetch 10 persons and then I skip by 10 persons, 54 00:04:30,120 --> 00:04:31,180 so what's remaining, 55 00:04:31,320 --> 00:04:33,040 well zero. 56 00:04:33,210 --> 00:04:34,950 So that is why this order is important, 57 00:04:35,010 --> 00:04:36,750 the order did not matter on the 58 00:04:36,880 --> 00:04:43,400 find method, there you could chain the skip and limit methods onto your cursor as you want it. Here 59 00:04:43,530 --> 00:04:49,170 it does matter because your pipeline is processed step by step. 60 00:04:49,380 --> 00:04:51,820 The same of course is true for sorting, 61 00:04:51,930 --> 00:05:02,130 if I do sort after this, so after skiping and limiting, I get a totally different set of results because 62 00:05:02,130 --> 00:05:10,110 if I now execute this, I have people who are not that old, they're born in the 1980s, even 1990s. The reason 63 00:05:10,110 --> 00:05:16,900 for that is that I simply skip the first ten persons in my dataset as it's stored in the collection 64 00:05:17,280 --> 00:05:23,140 and then I take the next 10 and I only sort these 10 persons. 65 00:05:23,180 --> 00:05:28,130 So this is really important to understand, the same of course would be the case if you have a match in 66 00:05:28,130 --> 00:05:36,480 there. If we want to find the oldest males, let's say, we should include a match phase and we should include 67 00:05:36,530 --> 00:05:41,210 that early to limit the amount of data we have to work with in the other stages. 68 00:05:41,340 --> 00:05:47,560 So here, I can match for gender being equal to male of course 69 00:05:47,560 --> 00:05:52,550 and if I do that, everything will be fine and I get the 10 70 00:05:52,550 --> 00:05:53,600 oldest men here, 71 00:05:53,630 --> 00:05:58,730 well not right now because the order is incorrect, we just broke that right 72 00:05:58,850 --> 00:06:00,480 but we do match. 73 00:06:00,590 --> 00:06:11,130 However if I fix the order like this and I execute it then, now we do have the 10 oldest men 74 00:06:11,240 --> 00:06:18,560 and as I mentioned, the order does matter because if I do match after sorting, which I can do, I can filter 75 00:06:18,560 --> 00:06:22,920 at any point of time, I don't have to do this at the beginning but if I do match here, 76 00:06:24,920 --> 00:06:26,840 I get back no results at all 77 00:06:26,870 --> 00:06:31,040 and the reason for that is that gender is not included in my projection phase, 78 00:06:31,160 --> 00:06:33,920 so I try to match something which is not there. 79 00:06:34,250 --> 00:06:41,630 If I do add gender here in my projection phase, then I will get results 80 00:06:41,800 --> 00:06:49,480 but now I still have a sub-optimal setup because I do match only after transforming all the elements 81 00:06:49,570 --> 00:06:50,920 in my collection. 82 00:06:50,920 --> 00:06:54,030 So I will have projected all the females here too 83 00:06:54,160 --> 00:06:56,780 even though later, I'm only interested in the males, 84 00:06:56,890 --> 00:07:04,000 so I should reverse this. Now actually mongodb will do some optimizations for you, 85 00:07:04,000 --> 00:07:09,910 you'll find details in the next lecture which has an article. Mongodb does some optimizations for you 86 00:07:10,000 --> 00:07:15,850 to optimize your pipeline, so it might very well have fixed this issue here for us but you shouldn't 87 00:07:15,850 --> 00:07:21,430 rely too much on that and you should try to build correct pipelines with the correct order that optimizes 88 00:07:21,430 --> 00:07:22,600 for performance 89 00:07:22,600 --> 00:07:26,970 and of course well, builds the kind of structure you want to have.