1
00:00:02,230 --> 00:00:03,480
Now here is the data

2
00:00:03,670 --> 00:00:06,540
as we have it in our dataset again.

3
00:00:06,580 --> 00:00:12,220
Now let's have a look at a different pipeline stage that allows us to transform every document instead of grouping

4
00:00:12,220 --> 00:00:13,260
multiple together

5
00:00:13,450 --> 00:00:16,110
and that will be the project stage.

6
00:00:16,120 --> 00:00:19,070
Now we already know projection from the find method,

7
00:00:19,150 --> 00:00:23,200
well as an aggregate stage, project became more powerful,

8
00:00:23,320 --> 00:00:25,930
so what can project do for us?

9
00:00:26,140 --> 00:00:30,710
Let's now start simple and not have any other set, I should say we don't do any filtering,

10
00:00:30,730 --> 00:00:33,700
we just want to transform every document.

11
00:00:33,760 --> 00:00:41,580
Then we can use the project stage and as all stages, the value for $project here is simply a document

12
00:00:41,790 --> 00:00:44,790
where you well configure that stage so to say

13
00:00:45,090 --> 00:00:53,230
and in its most simple form, project works in the same way as the projection works in the find method.

14
00:00:53,430 --> 00:00:58,560
So you can for example say I don't want to have the ID by setting it to zero here,

15
00:00:58,830 --> 00:01:06,600
you could say I only want to have the gender, I want to have the name field and location and email but

16
00:01:06,600 --> 00:01:10,580
I want to reformat some of these fields. So including gender is simple,

17
00:01:10,620 --> 00:01:11,730
we just add a one

18
00:01:11,970 --> 00:01:14,670
but let's say the name should be different

19
00:01:14,670 --> 00:01:20,670
now, let's say the name should be one field instead of this embedded document

20
00:01:20,820 --> 00:01:25,810
and this is something we can easily do with this project stage.

21
00:01:25,830 --> 00:01:32,670
We can add new fields here which is a cool feature and we could add the full name field and that full

22
00:01:32,670 --> 00:01:42,330
name field could then simply be created on the fly based on the nested name first and name last fields.

23
00:01:42,330 --> 00:01:44,130
Now how would that work?

24
00:01:44,130 --> 00:01:50,070
There is a special operator we can use and you'll find that in the official docs, a full list to all the

25
00:01:50,280 --> 00:01:53,880
operators can be found in an article after this lecture.

26
00:01:54,060 --> 00:01:59,640
So you can use a special operator, so you pass an object first of all to full name because we're going

27
00:01:59,640 --> 00:02:01,050
to perform an operation

28
00:02:01,380 --> 00:02:08,550
and then here, we'll use concat. Concat, $concat allows you concatenate two strings and you

29
00:02:08,550 --> 00:02:12,270
simply pass an array here which contains the two strings.

30
00:02:12,270 --> 00:02:18,330
Now you could hardcode some values in here like hello world,

31
00:02:18,360 --> 00:02:19,600
that is also possible,

32
00:02:19,620 --> 00:02:22,640
you can work with hardcoded data. If I now

33
00:02:22,650 --> 00:02:23,610
copy that,

34
00:02:25,610 --> 00:02:30,200
you'll see every person is named HelloWorld, concatenated together like this,

35
00:02:30,200 --> 00:02:32,920
also note that there is no whitespace in between.

36
00:02:32,920 --> 00:02:38,870
This is of course something you can do but most likely, this is not the result you want to have.

37
00:02:39,190 --> 00:02:43,880
So instead what we want to do is we want to refer to the first and last keys here

38
00:02:43,990 --> 00:02:45,970
in the name field,

39
00:02:46,150 --> 00:02:51,700
so what we can do for that is we can use a special syntax again in a string

40
00:02:51,700 --> 00:02:54,740
and the quotation marks are important, with a dollar sign,

41
00:02:54,880 --> 00:02:57,500
we can refer to name first.

42
00:02:57,550 --> 00:03:02,530
Now the dollar sign just simply tells mongodb that this is no hardcode text which it should

43
00:03:02,530 --> 00:03:08,290
take like this but that this refers to a field of the incoming document and that it should take the

44
00:03:08,290 --> 00:03:10,180
value of that field instead

45
00:03:10,240 --> 00:03:13,060
and in this case, it's just an embedded field,

46
00:03:13,060 --> 00:03:18,850
it could also be a normal field, a non-embedded one. So we can have the first name,

47
00:03:19,000 --> 00:03:21,760
then let's say we add a whitespace in-between

48
00:03:21,760 --> 00:03:27,910
and then here, we use name last to refer to this name here.

49
00:03:29,160 --> 00:03:38,190
If we now take this code and we run that, we indeed get back documents, the same amount of documents as

50
00:03:38,200 --> 00:03:39,180
before by the way

51
00:03:39,190 --> 00:03:45,610
because unlike group, project does not group multiple documents together, it just transforms every single

52
00:03:45,610 --> 00:03:48,630
document and therefore we get the same amount of documents

53
00:03:48,700 --> 00:03:50,390
but with a totally different data

54
00:03:50,590 --> 00:03:56,050
and the interesting part of course is that we cannot just include and exclude data but that we can even

55
00:03:56,050 --> 00:04:02,920
add new fields with hardcoded values if we want to do or in our case, with a derived value derived from

56
00:04:02,920 --> 00:04:05,570
the data that was in the document before.

57
00:04:05,650 --> 00:04:11,970
Now let's say we also want to make sure that the first and last names start with uppercase characters

58
00:04:12,640 --> 00:04:15,520
and we can do that too with the projection phase,

59
00:04:15,550 --> 00:04:18,470
I'll just restructure this real quick to make it a bit easier to read,

60
00:04:18,670 --> 00:04:21,770
we'll have to make sure to move these up later.

61
00:04:21,770 --> 00:04:26,010
So right now, we just concatenate our first and last names

62
00:04:26,140 --> 00:04:32,170
but of course we could transform these first and last names too before we concatenate them, so we can

63
00:04:32,170 --> 00:04:39,130
pass more complex expressions to concatenate essentially. The first name can be an expression described

64
00:04:39,130 --> 00:04:43,070
in a document and so can the last name.

65
00:04:43,150 --> 00:04:49,490
So in the first name here, we can use $toUpper which is another operator offered by

66
00:04:49,490 --> 00:04:50,800
mongodb

67
00:04:50,870 --> 00:04:58,410
and to upper can simply receive the field which it should turn into uppercase, so name first.

68
00:04:58,640 --> 00:05:05,860
Now it's the same syntax I need for the last element which we concatenate together, just add its

69
00:05:05,860 --> 00:05:13,320
name last here. Now to upper, this expression here and this expression, these will return strings so concatenate

70
00:05:13,320 --> 00:05:19,930
will still work on three strings, just that the first and the last one are also well transformed by us.

71
00:05:19,930 --> 00:05:24,140
Let me now move these two methods up and let's copy that,

72
00:05:24,580 --> 00:05:29,170
if we now run our aggregation, well everything is uppercase

73
00:05:29,170 --> 00:05:33,560
now, certainly a change but what if we only want to work with the first characters?

74
00:05:33,880 --> 00:05:36,320
Well then we drill into that further.

75
00:05:36,580 --> 00:05:42,140
Now we don't just run to upper on the entire first name or entire last name

76
00:05:42,280 --> 00:05:50,020
instead we want to run it on just the first character of these names and then we want to append the rest

77
00:05:50,020 --> 00:05:51,990
of that word in

78
00:05:52,030 --> 00:05:59,150
well the normal casing. For this, the thing which I transform with to upper is the result of yet another

79
00:05:59,230 --> 00:06:01,490
expression wrapped into a document,

80
00:06:01,540 --> 00:06:05,920
you see a pattern here, all these expressions are always wrapped into documents.

81
00:06:05,920 --> 00:06:13,940
Now there I use another operator, the substrCP operator which returns the substring of well a string,

82
00:06:13,960 --> 00:06:15,680
so a part of a string.

83
00:06:15,760 --> 00:06:17,600
SubstrCP takes an array,

84
00:06:17,650 --> 00:06:19,600
the first argument is the string,

85
00:06:19,660 --> 00:06:22,120
so here, I now use name first.

86
00:06:22,300 --> 00:06:27,870
The second argument is the starting character of your substring, this will be zero

87
00:06:27,880 --> 00:06:30,610
because strings are is zero indexed,

88
00:06:30,610 --> 00:06:37,000
so this is the first character of the string and then it asks you for how many characters should be included

89
00:06:37,000 --> 00:06:37,880
in the substring

90
00:06:37,960 --> 00:06:39,410
and that should be one here,

91
00:06:39,430 --> 00:06:41,360
so just the first character.

92
00:06:41,380 --> 00:06:46,200
Now this will turn the first character of the name into an uppercase character,

93
00:06:46,270 --> 00:06:50,210
of course we don't just want to have the first character of the name,

94
00:06:50,320 --> 00:06:55,030
we also want to have the other characters, just not converted to uppercase.

95
00:06:55,360 --> 00:07:01,810
So I will add an additional element to the concat array and that should now be the rest of the first

96
00:07:01,810 --> 00:07:07,030
name because here, we're extracting the first character and we're converting it to uppercase,

97
00:07:07,030 --> 00:07:10,440
now we need the rest of that name. For that,

98
00:07:10,490 --> 00:07:16,780
I'll again use my substring CP operator here because I need a substring of name first,

99
00:07:16,810 --> 00:07:18,150
I just need the other

100
00:07:18,220 --> 00:07:19,310
well half or

101
00:07:19,330 --> 00:07:24,030
well the rest of it and therefore, I start at character 1

102
00:07:24,100 --> 00:07:30,940
but now of course I need to find out how many elements I need and for that, I need to dynamically derive

103
00:07:30,940 --> 00:07:37,750
how long the name is and we can do this with another expression and it's not uncommon to have nested

104
00:07:37,780 --> 00:07:40,010
and more complex stages like this one.

105
00:07:40,030 --> 00:07:42,270
So don't worry if this looks very frightening,

106
00:07:42,490 --> 00:07:47,380
obviously this is a more complex transformation we're doing here but I find it super important for you

107
00:07:47,380 --> 00:07:53,290
to understand how you can work with all these expressions and operators, especially in the project phase

108
00:07:53,470 --> 00:07:56,270
where you do a lot of transformations typically.

109
00:07:56,290 --> 00:08:04,090
So now here, what I want to do is I want to use another operator, the subtract operator which simply

110
00:08:04,090 --> 00:08:06,010
returns the difference of two numbers,

111
00:08:06,010 --> 00:08:07,770
now why do I need that?

112
00:08:07,930 --> 00:08:16,420
Because the numbers are passed in array here because I need to find out the length of my name and then

113
00:08:16,420 --> 00:08:22,790
subtract one of that because we start after the first character in this substring.

114
00:08:22,810 --> 00:08:33,340
So here, I will now use another operator, strLenCP which calculates the length of a string and there,

115
00:08:33,350 --> 00:08:35,220
I point at name first

116
00:08:35,230 --> 00:08:40,130
again and then, I subtract one from that,

117
00:08:40,130 --> 00:08:44,170
so this is the second element in the subtract array.

118
00:08:44,190 --> 00:08:47,050
Ok so this is quite a complex operation

119
00:08:47,070 --> 00:08:52,530
but in the end what I do here is I simply retrieve the rest of the name by starting after the first

120
00:08:52,530 --> 00:08:53,130
character

121
00:08:53,220 --> 00:08:58,410
and then I find out how long that name is if I well reduce it by the one character,

122
00:08:58,410 --> 00:09:01,740
so if I subtract that one starting character.

123
00:09:01,740 --> 00:09:05,280
Now previously we had Gideon Van Drongelen,

124
00:09:05,430 --> 00:09:11,700
let's now repeat this for the last name and see if we have Gideon Van Drongelen where only the

125
00:09:11,700 --> 00:09:14,620
first character is an uppercase character.

126
00:09:14,640 --> 00:09:20,990
So let me grab this entire code and let's replace this code with it

127
00:09:21,170 --> 00:09:27,700
and now we just need to replace name first with name last in all three places and that should be it.

128
00:09:28,070 --> 00:09:36,440
Let's now move pretty up and aggregate up and copy that entire command,

129
00:09:36,470 --> 00:09:39,230
let's move over and execute it

130
00:09:39,280 --> 00:09:41,010
and this looks much better.

131
00:09:41,050 --> 00:09:44,170
Now we have only the first characters of each name

132
00:09:44,170 --> 00:09:49,630
converted to uppercase characters and you'll learn a lot about operators and how you typically work with

133
00:09:49,630 --> 00:09:55,990
them and combine them especially in the project phase which is all about transforming data.