1 00:00:02,220 --> 00:00:08,640 Now to make this a bit easier to follo, I will write my queries in a javascript helper file for now 2 00:00:08,700 --> 00:00:10,810 and then just copy them over into the console, 3 00:00:10,830 --> 00:00:15,510 since I believe this is a bit easier to see than if I directly typed them in there, as we will add more and 4 00:00:15,510 --> 00:00:22,260 more lines or stages and adding stages is easier here too because I can enter them in-between without sending 5 00:00:22,260 --> 00:00:22,650 everything 6 00:00:22,650 --> 00:00:23,730 by hitting enter. 7 00:00:24,060 --> 00:00:29,760 So we saw the match stage in the last lecture and match is essentially just taking a filter as you 8 00:00:29,760 --> 00:00:33,230 define it as an argument to the find method. 9 00:00:33,270 --> 00:00:42,030 More interesting is the group stage, the group stage allows you to well group your data by a certain 10 00:00:42,030 --> 00:00:46,630 field or by multiple fields, for that let's have a look at our data again. 11 00:00:46,800 --> 00:00:52,920 We got our persons there, only females right now and now by what could we group them? 12 00:00:53,010 --> 00:00:59,310 Now let's say we want to group by this state here and we want to see the sum of persons living in that 13 00:00:59,310 --> 00:01:03,540 state, with the aggregate method and the aggregation framework, 14 00:01:03,540 --> 00:01:06,780 this is easy to do. In group here, 15 00:01:06,810 --> 00:01:08,820 we need to define a couple of parameters, 16 00:01:08,850 --> 00:01:10,800 the first one always is 17 00:01:10,820 --> 00:01:15,210 _id. Now _id defines by which fields you want to group 18 00:01:15,360 --> 00:01:18,880 and now we will use _id in a way we haven't seen it before, 19 00:01:18,900 --> 00:01:22,370 the value for _id will be a document. 20 00:01:22,560 --> 00:01:28,530 Thus far, we always used an objectid, a string or maybe a number but we never use the document 21 00:01:28,740 --> 00:01:33,400 but just as with any other field, you can assign a document to _id. 22 00:01:33,570 --> 00:01:35,490 It's just not that common to be honest 23 00:01:35,610 --> 00:01:42,450 but for the group method here, for the group stage, you often see that syntax because that will be interpreted 24 00:01:42,510 --> 00:01:49,640 in a special way and it will basically allow you to define multiple fields by which you want to group. 25 00:01:49,650 --> 00:01:56,970 So in my case, I want to group by location state and I can do this by assigning a key here which I give 26 00:01:57,000 --> 00:01:58,280 any name I want, 27 00:01:58,530 --> 00:02:07,560 state for example and then $location.state. The dollar sign is important here because 28 00:02:07,560 --> 00:02:15,600 it tells mongodb that I'm referring to a field of our document which is passed into the group stage, 29 00:02:15,720 --> 00:02:17,920 so I'm referring to a field of this document, 30 00:02:18,000 --> 00:02:22,610 the location field and then I can access a nested field just with a dot, here 31 00:02:22,620 --> 00:02:31,030 no dollar sign is required. So this should now group our results by the state. 32 00:02:31,040 --> 00:02:37,940 Now we can add a new key to each document and you can name this key however you want, like total persons. 33 00:02:37,940 --> 00:02:44,210 Now here you would pass a document where you now describe the kind of aggregation function you want 34 00:02:44,210 --> 00:02:45,470 to execute, 35 00:02:45,560 --> 00:02:50,260 now these aggregation functions can also be found in the official docs of course. 36 00:02:50,570 --> 00:02:59,890 There if you go to the group stage, you'll see all the aggregation or accumulator operators that are 37 00:02:59,890 --> 00:03:01,000 supported 38 00:03:01,300 --> 00:03:04,530 and for us, the sum operator is interesting. 39 00:03:04,540 --> 00:03:10,300 Now as always, read through all these docs and all these examples to learn all about the niche and edge 40 00:03:10,300 --> 00:03:12,800 cases of each stage or each operator, 41 00:03:12,970 --> 00:03:17,700 we will use the sum here by using $sum and then a value 42 00:03:17,830 --> 00:03:22,600 you want to add for every document that is grouped together. 43 00:03:22,690 --> 00:03:30,290 So if we have three people from the same location state, sum would be incremented by 1 times 3 44 00:03:30,730 --> 00:03:33,530 and the interesting thing here is that mongodb 45 00:03:33,640 --> 00:03:36,610 will basically do this summing up for us, 46 00:03:36,640 --> 00:03:43,630 it will keep the aggregated sum in memory until it's done with a group and then writes the total sum into 47 00:03:43,630 --> 00:03:44,920 this field. 48 00:03:44,920 --> 00:03:50,130 It's also important to understand that group does accumulate data, 49 00:03:50,260 --> 00:03:58,210 now that simply means that you might have multiple documents with the same state and group will only 50 00:03:58,210 --> 00:04:05,250 output one, so three documents with the same state will be merged into one because you are aggregating, 51 00:04:05,260 --> 00:04:07,930 you're building a sum in this case. 52 00:04:07,930 --> 00:04:09,320 OK so enough of the talking, 53 00:04:09,400 --> 00:04:10,810 let's now run this function. 54 00:04:10,990 --> 00:04:13,030 Let's simply copy it here, 55 00:04:13,150 --> 00:04:24,360 move over into the console and then paste it in and you see I get back some results and I should move 56 00:04:24,430 --> 00:04:29,740 pretty one line higher so that pretty is also understood, 57 00:04:30,600 --> 00:04:33,580 so now I can execute this and I get back some results. 58 00:04:33,690 --> 00:04:38,670 Now what we see is we still have a lot of different states here in our data but we can already see the 59 00:04:38,670 --> 00:04:40,050 aggregation seemed to work, 60 00:04:40,050 --> 00:04:41,880 we get a totally different output, 61 00:04:41,910 --> 00:04:45,110 we no longer have any person data because we changed it. 62 00:04:45,150 --> 00:04:52,890 We used group to merge our documents into new documents with totally different data with the total persons 63 00:04:53,130 --> 00:04:59,790 and that ID, that ID as you can tell is that object we defined with the state in which it's grouped 64 00:04:59,790 --> 00:05:05,400 and even though most states only have one person which is simply related to the structure of our demo data 65 00:05:05,400 --> 00:05:05,670 set 66 00:05:05,670 --> 00:05:12,210 here, we see there are some states which seem to have more persons like sinop. And we can simply prove 67 00:05:12,210 --> 00:05:19,230 that our aggregation is working correctly here by manually reaching out to our persons and finding all 68 00:05:19,230 --> 00:05:25,760 persons where the location state is equal to sinop. 69 00:05:25,770 --> 00:05:33,700 So if I add pretty here, I should find three females, important I might find more persons but three females. 70 00:05:33,780 --> 00:05:41,780 So if I scroll up, get one female, here's a male so we can't count him in because we filtered for females in our aggregation 71 00:05:41,780 --> 00:05:42,750 pipeline, 72 00:05:42,860 --> 00:05:46,810 here's another female and this is the last record and also a female, 73 00:05:46,820 --> 00:05:52,160 so we get three females in this state. And this is the group stage in action, 74 00:05:52,160 --> 00:05:53,980 now let's play around with that a bit more.