1
00:00:02,320 --> 00:00:07,010
This is the aggregation function as we executed it in the last lecture with group.

2
00:00:07,040 --> 00:00:10,620
Now as you saw there, we lost all the existing data

3
00:00:10,670 --> 00:00:13,900
but that made sense because we grouped together our data

4
00:00:14,030 --> 00:00:19,170
and if you do such a grouping, you will typically be fine with losing the data.

5
00:00:19,190 --> 00:00:27,320
Now of course when we ran that method here, when we ran our pipeline like this, what we got was a bunch

6
00:00:27,320 --> 00:00:30,400
of outputs in a totally unsorted order,

7
00:00:30,440 --> 00:00:37,190
of course we can also sort and now here is already something where you see the advantage of the aggregation

8
00:00:37,190 --> 00:00:37,980
pipeline.

9
00:00:38,270 --> 00:00:44,400
You can sort at any place here of course but we probably want to sort by total persons now,

10
00:00:44,660 --> 00:00:49,040
so this is something which we only can do after having grouped,

11
00:00:49,160 --> 00:00:54,530
we can't run the sort on our input data because that will just be the person documents and there, we

12
00:00:54,530 --> 00:00:56,690
can sort on things like the age and so on

13
00:00:56,750 --> 00:01:04,340
but we can sort on the amount of persons in a state because that is a result we only derived here.

14
00:01:04,460 --> 00:01:09,030
So what we can do here is we can of course add a new pipeline stage, the sort

15
00:01:09,060 --> 00:01:15,650
stage and the sort stage also takes a document as an input to define how the sorting should happen

16
00:01:15,650 --> 00:01:23,360
and you can simply sort as you also sorted before. So you can say I now want to sort by

17
00:01:23,360 --> 00:01:29,480
and now of course you can say total persons, referring to that field which we introduced in the last

18
00:01:29,480 --> 00:01:30,710
pipeline stage.

19
00:01:30,750 --> 00:01:37,190
This is not a field existing in our input dataset but this does not matter because as you learned, each

20
00:01:37,490 --> 00:01:44,510
pipeline stage passes some output data to the next stage and that output data is the only data that

21
00:01:44,510 --> 00:01:45,960
next stage has.

22
00:01:45,980 --> 00:01:51,260
So this sort stage does not have access to the original data as we fetched it from the collection,

23
00:01:51,260 --> 00:01:55,160
it only has access to the output data of our group stage.

24
00:01:55,160 --> 00:02:00,890
So there, we will have a total persons field and we can now sort by this in descending order to have

25
00:02:00,890 --> 00:02:02,710
the highest values first.

26
00:02:03,050 --> 00:02:11,030
If we now copy that over into our shell, we indeed see that we have some sorted results here,

27
00:02:11,030 --> 00:02:15,570
so you see we got a bunch of results,

28
00:02:15,580 --> 00:02:20,170
that is the first one with 33 persons in this state,

29
00:02:20,300 --> 00:02:25,680
then we got 22 persons this state, here we got a bit of a longer state name hence the different output

30
00:02:25,700 --> 00:02:27,660
but there we got 24 persons.

31
00:02:27,710 --> 00:02:28,910
So this looks alright to me,

32
00:02:28,910 --> 00:02:34,610
we see the sorting works and the interesting thing here really is that the sorting was done on the

33
00:02:34,610 --> 00:02:37,250
output of our previous stage,

34
00:02:37,250 --> 00:02:44,110
so on the output of the group stage. And I hope this already shows you that you have a lot of power with

35
00:02:44,110 --> 00:02:52,130
these tools already because this is essentially a kind of operation we couldn't do with the normal find

36
00:02:52,160 --> 00:02:58,070
method because there, we can't group and then sort on the result of our group. We would have to do that in

37
00:02:58,070 --> 00:03:04,760
the client side code with just find, well with aggregate we can run it on the mongodb server and

38
00:03:04,760 --> 00:03:10,100
then simply get back the data in the client that we need in our client to work with.