1
00:00:02,220 --> 00:00:08,640
Now to make this a bit easier to follo, I will write my queries in a javascript helper file for now

2
00:00:08,700 --> 00:00:10,810
and then just copy them over into the console,

3
00:00:10,830 --> 00:00:15,510
since I believe this is a bit easier to see than if I directly typed them in there, as we will add more and

4
00:00:15,510 --> 00:00:22,260
more lines or stages and adding stages is easier here too because I can enter them in-between without sending

5
00:00:22,260 --> 00:00:22,650
everything

6
00:00:22,650 --> 00:00:23,730
by hitting enter.

7
00:00:24,060 --> 00:00:29,760
So we saw the match stage in the last lecture and match is essentially just taking a filter as you

8
00:00:29,760 --> 00:00:33,230
define it as an argument to the find method.

9
00:00:33,270 --> 00:00:42,030
More interesting is the group stage, the group stage allows you to well group your data by a certain

10
00:00:42,030 --> 00:00:46,630
field or by multiple fields, for that let's have a look at our data again.

11
00:00:46,800 --> 00:00:52,920
We got our persons there, only females right now and now by what could we group them?

12
00:00:53,010 --> 00:00:59,310
Now let's say we want to group by this state here and we want to see the sum of persons living in that

13
00:00:59,310 --> 00:01:03,540
state, with the aggregate method and the aggregation framework,

14
00:01:03,540 --> 00:01:06,780
this is easy to do. In group here,

15
00:01:06,810 --> 00:01:08,820
we need to define a couple of parameters,

16
00:01:08,850 --> 00:01:10,800
the first one always is

17
00:01:10,820 --> 00:01:15,210
_id. Now _id defines by which fields you want to group

18
00:01:15,360 --> 00:01:18,880
and now we will use _id in a way we haven't seen it before,

19
00:01:18,900 --> 00:01:22,370
the value for _id will be a document.

20
00:01:22,560 --> 00:01:28,530
Thus far, we always used an objectid, a string or maybe a number but we never use the document

21
00:01:28,740 --> 00:01:33,400
but just as with any other field, you can assign a document to _id.

22
00:01:33,570 --> 00:01:35,490
It's just not that common to be honest

23
00:01:35,610 --> 00:01:42,450
but for the group method here, for the group stage, you often see that syntax because that will be interpreted

24
00:01:42,510 --> 00:01:49,640
in a special way and it will basically allow you to define multiple fields by which you want to group.

25
00:01:49,650 --> 00:01:56,970
So in my case, I want to group by location state and I can do this by assigning a key here which I give

26
00:01:57,000 --> 00:01:58,280
any name I want,

27
00:01:58,530 --> 00:02:07,560
state for example and then $location.state. The dollar sign is important here because

28
00:02:07,560 --> 00:02:15,600
it tells mongodb that I'm referring to a field of our document which is passed into the group stage,

29
00:02:15,720 --> 00:02:17,920
so I'm referring to a field of this document,

30
00:02:18,000 --> 00:02:22,610
the location field and then I can access a nested field just with a dot, here

31
00:02:22,620 --> 00:02:31,030
no dollar sign is required. So this should now group our results by the state.

32
00:02:31,040 --> 00:02:37,940
Now we can add a new key to each document and you can name this key however you want, like total persons.

33
00:02:37,940 --> 00:02:44,210
Now here you would pass a document where you now describe the kind of aggregation function you want

34
00:02:44,210 --> 00:02:45,470
to execute,

35
00:02:45,560 --> 00:02:50,260
now these aggregation functions can also be found in the official docs of course.

36
00:02:50,570 --> 00:02:59,890
There if you go to the group stage, you'll see all the aggregation or accumulator operators that are

37
00:02:59,890 --> 00:03:01,000
supported

38
00:03:01,300 --> 00:03:04,530
and for us, the sum operator is interesting.

39
00:03:04,540 --> 00:03:10,300
Now as always, read through all these docs and all these examples to learn all about the niche and edge

40
00:03:10,300 --> 00:03:12,800
cases of each stage or each operator,

41
00:03:12,970 --> 00:03:17,700
we will use the sum here by using $sum and then a value

42
00:03:17,830 --> 00:03:22,600
you want to add for every document that is grouped together.

43
00:03:22,690 --> 00:03:30,290
So if we have three people from the same location state, sum would be incremented by 1 times 3

44
00:03:30,730 --> 00:03:33,530
and the interesting thing here is that mongodb

45
00:03:33,640 --> 00:03:36,610
will basically do this summing up for us,

46
00:03:36,640 --> 00:03:43,630
it will keep the aggregated sum in memory until it's done with a group and then writes the total sum into

47
00:03:43,630 --> 00:03:44,920
this field.

48
00:03:44,920 --> 00:03:50,130
It's also important to understand that group does accumulate data,

49
00:03:50,260 --> 00:03:58,210
now that simply means that you might have multiple documents with the same state and group will only

50
00:03:58,210 --> 00:04:05,250
output one, so three documents with the same state will be merged into one because you are aggregating,

51
00:04:05,260 --> 00:04:07,930
you're building a sum in this case.

52
00:04:07,930 --> 00:04:09,320
OK so enough of the talking,

53
00:04:09,400 --> 00:04:10,810
let's now run this function.

54
00:04:10,990 --> 00:04:13,030
Let's simply copy it here,

55
00:04:13,150 --> 00:04:24,360
move over into the console and then paste it in and you see I get back some results and I should move

56
00:04:24,430 --> 00:04:29,740
pretty one line higher so that pretty is also understood,

57
00:04:30,600 --> 00:04:33,580
so now I can execute this and I get back some results.

58
00:04:33,690 --> 00:04:38,670
Now what we see is we still have a lot of different states here in our data but we can already see the

59
00:04:38,670 --> 00:04:40,050
aggregation seemed to work,

60
00:04:40,050 --> 00:04:41,880
we get a totally different output,

61
00:04:41,910 --> 00:04:45,110
we no longer have any person data because we changed it.

62
00:04:45,150 --> 00:04:52,890
We used group to merge our documents into new documents with totally different data with the total persons

63
00:04:53,130 --> 00:04:59,790
and that ID, that ID as you can tell is that object we defined with the state in which it's grouped

64
00:04:59,790 --> 00:05:05,400
and even though most states only have one person which is simply related to the structure of our demo data

65
00:05:05,400 --> 00:05:05,670
set

66
00:05:05,670 --> 00:05:12,210
here, we see there are some states which seem to have more persons like sinop. And we can simply prove

67
00:05:12,210 --> 00:05:19,230
that our aggregation is working correctly here by manually reaching out to our persons and finding all

68
00:05:19,230 --> 00:05:25,760
persons where the location state is equal to sinop.

69
00:05:25,770 --> 00:05:33,700
So if I add pretty here, I should find three females, important I might find more persons but three females.

70
00:05:33,780 --> 00:05:41,780
So if I scroll up, get one female, here's a male so we can't count him in because we filtered for females in our aggregation

71
00:05:41,780 --> 00:05:42,750
pipeline,

72
00:05:42,860 --> 00:05:46,810
here's another female and this is the last record and also a female,

73
00:05:46,820 --> 00:05:52,160
so we get three females in this state. And this is the group stage in action,

74
00:05:52,160 --> 00:05:53,980
now let's play around with that a bit more.