1 00:00:02,230 --> 00:00:03,480 Now here is the data 2 00:00:03,670 --> 00:00:06,540 as we have it in our dataset again. 3 00:00:06,580 --> 00:00:12,220 Now let's have a look at a different pipeline stage that allows us to transform every document instead of grouping 4 00:00:12,220 --> 00:00:13,260 multiple together 5 00:00:13,450 --> 00:00:16,110 and that will be the project stage. 6 00:00:16,120 --> 00:00:19,070 Now we already know projection from the find method, 7 00:00:19,150 --> 00:00:23,200 well as an aggregate stage, project became more powerful, 8 00:00:23,320 --> 00:00:25,930 so what can project do for us? 9 00:00:26,140 --> 00:00:30,710 Let's now start simple and not have any other set, I should say we don't do any filtering, 10 00:00:30,730 --> 00:00:33,700 we just want to transform every document. 11 00:00:33,760 --> 00:00:41,580 Then we can use the project stage and as all stages, the value for $project here is simply a document 12 00:00:41,790 --> 00:00:44,790 where you well configure that stage so to say 13 00:00:45,090 --> 00:00:53,230 and in its most simple form, project works in the same way as the projection works in the find method. 14 00:00:53,430 --> 00:00:58,560 So you can for example say I don't want to have the ID by setting it to zero here, 15 00:00:58,830 --> 00:01:06,600 you could say I only want to have the gender, I want to have the name field and location and email but 16 00:01:06,600 --> 00:01:10,580 I want to reformat some of these fields. So including gender is simple, 17 00:01:10,620 --> 00:01:11,730 we just add a one 18 00:01:11,970 --> 00:01:14,670 but let's say the name should be different 19 00:01:14,670 --> 00:01:20,670 now, let's say the name should be one field instead of this embedded document 20 00:01:20,820 --> 00:01:25,810 and this is something we can easily do with this project stage. 21 00:01:25,830 --> 00:01:32,670 We can add new fields here which is a cool feature and we could add the full name field and that full 22 00:01:32,670 --> 00:01:42,330 name field could then simply be created on the fly based on the nested name first and name last fields. 23 00:01:42,330 --> 00:01:44,130 Now how would that work? 24 00:01:44,130 --> 00:01:50,070 There is a special operator we can use and you'll find that in the official docs, a full list to all the 25 00:01:50,280 --> 00:01:53,880 operators can be found in an article after this lecture. 26 00:01:54,060 --> 00:01:59,640 So you can use a special operator, so you pass an object first of all to full name because we're going 27 00:01:59,640 --> 00:02:01,050 to perform an operation 28 00:02:01,380 --> 00:02:08,550 and then here, we'll use concat. Concat, $concat allows you concatenate two strings and you 29 00:02:08,550 --> 00:02:12,270 simply pass an array here which contains the two strings. 30 00:02:12,270 --> 00:02:18,330 Now you could hardcode some values in here like hello world, 31 00:02:18,360 --> 00:02:19,600 that is also possible, 32 00:02:19,620 --> 00:02:22,640 you can work with hardcoded data. If I now 33 00:02:22,650 --> 00:02:23,610 copy that, 34 00:02:25,610 --> 00:02:30,200 you'll see every person is named HelloWorld, concatenated together like this, 35 00:02:30,200 --> 00:02:32,920 also note that there is no whitespace in between. 36 00:02:32,920 --> 00:02:38,870 This is of course something you can do but most likely, this is not the result you want to have. 37 00:02:39,190 --> 00:02:43,880 So instead what we want to do is we want to refer to the first and last keys here 38 00:02:43,990 --> 00:02:45,970 in the name field, 39 00:02:46,150 --> 00:02:51,700 so what we can do for that is we can use a special syntax again in a string 40 00:02:51,700 --> 00:02:54,740 and the quotation marks are important, with a dollar sign, 41 00:02:54,880 --> 00:02:57,500 we can refer to name first. 42 00:02:57,550 --> 00:03:02,530 Now the dollar sign just simply tells mongodb that this is no hardcode text which it should 43 00:03:02,530 --> 00:03:08,290 take like this but that this refers to a field of the incoming document and that it should take the 44 00:03:08,290 --> 00:03:10,180 value of that field instead 45 00:03:10,240 --> 00:03:13,060 and in this case, it's just an embedded field, 46 00:03:13,060 --> 00:03:18,850 it could also be a normal field, a non-embedded one. So we can have the first name, 47 00:03:19,000 --> 00:03:21,760 then let's say we add a whitespace in-between 48 00:03:21,760 --> 00:03:27,910 and then here, we use name last to refer to this name here. 49 00:03:29,160 --> 00:03:38,190 If we now take this code and we run that, we indeed get back documents, the same amount of documents as 50 00:03:38,200 --> 00:03:39,180 before by the way 51 00:03:39,190 --> 00:03:45,610 because unlike group, project does not group multiple documents together, it just transforms every single 52 00:03:45,610 --> 00:03:48,630 document and therefore we get the same amount of documents 53 00:03:48,700 --> 00:03:50,390 but with a totally different data 54 00:03:50,590 --> 00:03:56,050 and the interesting part of course is that we cannot just include and exclude data but that we can even 55 00:03:56,050 --> 00:04:02,920 add new fields with hardcoded values if we want to do or in our case, with a derived value derived from 56 00:04:02,920 --> 00:04:05,570 the data that was in the document before. 57 00:04:05,650 --> 00:04:11,970 Now let's say we also want to make sure that the first and last names start with uppercase characters 58 00:04:12,640 --> 00:04:15,520 and we can do that too with the projection phase, 59 00:04:15,550 --> 00:04:18,470 I'll just restructure this real quick to make it a bit easier to read, 60 00:04:18,670 --> 00:04:21,770 we'll have to make sure to move these up later. 61 00:04:21,770 --> 00:04:26,010 So right now, we just concatenate our first and last names 62 00:04:26,140 --> 00:04:32,170 but of course we could transform these first and last names too before we concatenate them, so we can 63 00:04:32,170 --> 00:04:39,130 pass more complex expressions to concatenate essentially. The first name can be an expression described 64 00:04:39,130 --> 00:04:43,070 in a document and so can the last name. 65 00:04:43,150 --> 00:04:49,490 So in the first name here, we can use $toUpper which is another operator offered by 66 00:04:49,490 --> 00:04:50,800 mongodb 67 00:04:50,870 --> 00:04:58,410 and to upper can simply receive the field which it should turn into uppercase, so name first. 68 00:04:58,640 --> 00:05:05,860 Now it's the same syntax I need for the last element which we concatenate together, just add its 69 00:05:05,860 --> 00:05:13,320 name last here. Now to upper, this expression here and this expression, these will return strings so concatenate 70 00:05:13,320 --> 00:05:19,930 will still work on three strings, just that the first and the last one are also well transformed by us. 71 00:05:19,930 --> 00:05:24,140 Let me now move these two methods up and let's copy that, 72 00:05:24,580 --> 00:05:29,170 if we now run our aggregation, well everything is uppercase 73 00:05:29,170 --> 00:05:33,560 now, certainly a change but what if we only want to work with the first characters? 74 00:05:33,880 --> 00:05:36,320 Well then we drill into that further. 75 00:05:36,580 --> 00:05:42,140 Now we don't just run to upper on the entire first name or entire last name 76 00:05:42,280 --> 00:05:50,020 instead we want to run it on just the first character of these names and then we want to append the rest 77 00:05:50,020 --> 00:05:51,990 of that word in 78 00:05:52,030 --> 00:05:59,150 well the normal casing. For this, the thing which I transform with to upper is the result of yet another 79 00:05:59,230 --> 00:06:01,490 expression wrapped into a document, 80 00:06:01,540 --> 00:06:05,920 you see a pattern here, all these expressions are always wrapped into documents. 81 00:06:05,920 --> 00:06:13,940 Now there I use another operator, the substrCP operator which returns the substring of well a string, 82 00:06:13,960 --> 00:06:15,680 so a part of a string. 83 00:06:15,760 --> 00:06:17,600 SubstrCP takes an array, 84 00:06:17,650 --> 00:06:19,600 the first argument is the string, 85 00:06:19,660 --> 00:06:22,120 so here, I now use name first. 86 00:06:22,300 --> 00:06:27,870 The second argument is the starting character of your substring, this will be zero 87 00:06:27,880 --> 00:06:30,610 because strings are is zero indexed, 88 00:06:30,610 --> 00:06:37,000 so this is the first character of the string and then it asks you for how many characters should be included 89 00:06:37,000 --> 00:06:37,880 in the substring 90 00:06:37,960 --> 00:06:39,410 and that should be one here, 91 00:06:39,430 --> 00:06:41,360 so just the first character. 92 00:06:41,380 --> 00:06:46,200 Now this will turn the first character of the name into an uppercase character, 93 00:06:46,270 --> 00:06:50,210 of course we don't just want to have the first character of the name, 94 00:06:50,320 --> 00:06:55,030 we also want to have the other characters, just not converted to uppercase. 95 00:06:55,360 --> 00:07:01,810 So I will add an additional element to the concat array and that should now be the rest of the first 96 00:07:01,810 --> 00:07:07,030 name because here, we're extracting the first character and we're converting it to uppercase, 97 00:07:07,030 --> 00:07:10,440 now we need the rest of that name. For that, 98 00:07:10,490 --> 00:07:16,780 I'll again use my substring CP operator here because I need a substring of name first, 99 00:07:16,810 --> 00:07:18,150 I just need the other 100 00:07:18,220 --> 00:07:19,310 well half or 101 00:07:19,330 --> 00:07:24,030 well the rest of it and therefore, I start at character 1 102 00:07:24,100 --> 00:07:30,940 but now of course I need to find out how many elements I need and for that, I need to dynamically derive 103 00:07:30,940 --> 00:07:37,750 how long the name is and we can do this with another expression and it's not uncommon to have nested 104 00:07:37,780 --> 00:07:40,010 and more complex stages like this one. 105 00:07:40,030 --> 00:07:42,270 So don't worry if this looks very frightening, 106 00:07:42,490 --> 00:07:47,380 obviously this is a more complex transformation we're doing here but I find it super important for you 107 00:07:47,380 --> 00:07:53,290 to understand how you can work with all these expressions and operators, especially in the project phase 108 00:07:53,470 --> 00:07:56,270 where you do a lot of transformations typically. 109 00:07:56,290 --> 00:08:04,090 So now here, what I want to do is I want to use another operator, the subtract operator which simply 110 00:08:04,090 --> 00:08:06,010 returns the difference of two numbers, 111 00:08:06,010 --> 00:08:07,770 now why do I need that? 112 00:08:07,930 --> 00:08:16,420 Because the numbers are passed in array here because I need to find out the length of my name and then 113 00:08:16,420 --> 00:08:22,790 subtract one of that because we start after the first character in this substring. 114 00:08:22,810 --> 00:08:33,340 So here, I will now use another operator, strLenCP which calculates the length of a string and there, 115 00:08:33,350 --> 00:08:35,220 I point at name first 116 00:08:35,230 --> 00:08:40,130 again and then, I subtract one from that, 117 00:08:40,130 --> 00:08:44,170 so this is the second element in the subtract array. 118 00:08:44,190 --> 00:08:47,050 Ok so this is quite a complex operation 119 00:08:47,070 --> 00:08:52,530 but in the end what I do here is I simply retrieve the rest of the name by starting after the first 120 00:08:52,530 --> 00:08:53,130 character 121 00:08:53,220 --> 00:08:58,410 and then I find out how long that name is if I well reduce it by the one character, 122 00:08:58,410 --> 00:09:01,740 so if I subtract that one starting character. 123 00:09:01,740 --> 00:09:05,280 Now previously we had Gideon Van Drongelen, 124 00:09:05,430 --> 00:09:11,700 let's now repeat this for the last name and see if we have Gideon Van Drongelen where only the 125 00:09:11,700 --> 00:09:14,620 first character is an uppercase character. 126 00:09:14,640 --> 00:09:20,990 So let me grab this entire code and let's replace this code with it 127 00:09:21,170 --> 00:09:27,700 and now we just need to replace name first with name last in all three places and that should be it. 128 00:09:28,070 --> 00:09:36,440 Let's now move pretty up and aggregate up and copy that entire command, 129 00:09:36,470 --> 00:09:39,230 let's move over and execute it 130 00:09:39,280 --> 00:09:41,010 and this looks much better. 131 00:09:41,050 --> 00:09:44,170 Now we have only the first characters of each name 132 00:09:44,170 --> 00:09:49,630 converted to uppercase characters and you'll learn a lot about operators and how you typically work with 133 00:09:49,630 --> 00:09:55,990 them and combine them especially in the project phase which is all about transforming data.