1 00:00:01,150 --> 00:00:02,540 So in the last lecture, 2 00:00:02,540 --> 00:00:04,990 we learned a theory about data modeling. 3 00:00:04,990 --> 00:00:07,430 And, so, let's now use that theory 4 00:00:07,430 --> 00:00:09,930 in order to actually design the data model 5 00:00:09,930 --> 00:00:12,140 of our Natours application. 6 00:00:12,140 --> 00:00:15,160 And this is for me and for many other developers 7 00:00:15,160 --> 00:00:18,400 actually the most difficult part of building an app. 8 00:00:18,400 --> 00:00:21,570 And, so, I hope that this application will serve 9 00:00:21,570 --> 00:00:24,660 as a good example to you and to give you the knowledge 10 00:00:24,660 --> 00:00:27,860 in order to later design your own data models, 11 00:00:27,860 --> 00:00:29,663 basically completely on your own. 12 00:00:30,640 --> 00:00:32,130 So let's do it now. 13 00:00:32,130 --> 00:00:34,560 And let's start with all the datasets 14 00:00:34,560 --> 00:00:37,690 that we actually need in our application. 15 00:00:37,690 --> 00:00:39,430 So starting with the tours, 16 00:00:39,430 --> 00:00:41,630 and that's of course the most obvious one. 17 00:00:41,630 --> 00:00:44,730 And we already have this one implemented. 18 00:00:44,730 --> 00:00:47,150 Then also we need some users. 19 00:00:47,150 --> 00:00:50,590 And, again, we already have a users collection actually 20 00:00:50,590 --> 00:00:51,870 in our database. 21 00:00:51,870 --> 00:00:54,020 And, so, basically tours and users 22 00:00:54,020 --> 00:00:56,470 are two completely separate datasets. 23 00:00:56,470 --> 00:00:58,270 And, so, we have them normalized. 24 00:00:58,270 --> 00:01:00,593 And of course they're not gonna be embedded. 25 00:01:01,540 --> 00:01:04,270 Next up, we're also gonna have reviews, 26 00:01:04,270 --> 00:01:06,360 and we will also have locations. 27 00:01:06,360 --> 00:01:07,300 Okay? 28 00:01:07,300 --> 00:01:09,380 Because most tours actually have a number 29 00:01:09,380 --> 00:01:10,930 of different locations. 30 00:01:10,930 --> 00:01:11,763 Okay? 31 00:01:11,763 --> 00:01:14,600 And, so, that again is yet another dataset. 32 00:01:14,600 --> 00:01:17,300 And finally, we're also gonna have bookings. 33 00:01:17,300 --> 00:01:20,780 But a little bit more about why that is in a second. 34 00:01:20,780 --> 00:01:23,320 Okay, so, we have all these datasets. 35 00:01:23,320 --> 00:01:25,950 Now let's actually model the relationships 36 00:01:25,950 --> 00:01:27,480 that exist between them. 37 00:01:27,480 --> 00:01:29,100 And I'm gonna start with the relationship 38 00:01:29,100 --> 00:01:31,470 between users and reviews. 39 00:01:31,470 --> 00:01:36,100 And this relationship is clearly a one-to-many relationship 40 00:01:36,100 --> 00:01:39,260 because one user can write multiple reviews, 41 00:01:39,260 --> 00:01:42,360 but one review can only belong to one user. 42 00:01:42,360 --> 00:01:45,550 And the parent in this relationship is clearly the users, 43 00:01:45,550 --> 00:01:47,240 and the child, the reviews 44 00:01:47,240 --> 00:01:51,160 because again it's the parent, so the users in this case, 45 00:01:51,160 --> 00:01:53,560 who can be related to many reviews, 46 00:01:53,560 --> 00:01:56,730 but one review can only be related to one user. 47 00:01:56,730 --> 00:01:59,290 Anyway, I chose to model this relationship 48 00:01:59,290 --> 00:02:01,160 using parent referencing. 49 00:02:01,160 --> 00:02:04,830 And that's because a user can write a lot of reviews 50 00:02:04,830 --> 00:02:07,490 and also because we might actually need to query 51 00:02:07,490 --> 00:02:09,600 only for the reviews on their own. 52 00:02:09,600 --> 00:02:12,490 So the data axis pattern is really important 53 00:02:12,490 --> 00:02:16,300 to take into consideration in this particular relationship. 54 00:02:16,300 --> 00:02:18,940 Now, about the kind of referencing that we're gonna use, 55 00:02:18,940 --> 00:02:20,610 it is parent referencing, 56 00:02:20,610 --> 00:02:24,220 so basically the review keeping a reference of the user. 57 00:02:24,220 --> 00:02:26,670 So keeping an ID, basically. 58 00:02:26,670 --> 00:02:28,220 And that is as you already know 59 00:02:28,220 --> 00:02:32,510 because we do not want to allow a race to grow indefinitely. 60 00:02:32,510 --> 00:02:33,940 And that might be the case 61 00:02:33,940 --> 00:02:37,860 if a user writes tons and tons (laughs) of reviews. 62 00:02:37,860 --> 00:02:38,930 Okay? 63 00:02:38,930 --> 00:02:41,790 Also, it's nice to have the review knowing 64 00:02:41,790 --> 00:02:43,220 who actually wrote it. 65 00:02:43,220 --> 00:02:44,053 Okay? 66 00:02:44,053 --> 00:02:46,440 And, so, having the user ID right on the review 67 00:02:46,440 --> 00:02:48,273 will also us to do just that. 68 00:02:49,120 --> 00:02:49,953 All right. 69 00:02:49,953 --> 00:02:51,060 Next up, let's take a look 70 00:02:51,060 --> 00:02:54,310 at the relationship between tours and reviews. 71 00:02:54,310 --> 00:02:56,580 And this one is actually very similar. 72 00:02:56,580 --> 00:02:59,450 So, again, it's a one-to-many relationship, 73 00:02:59,450 --> 00:03:02,070 where one tour can have multiple reviews 74 00:03:02,070 --> 00:03:05,260 but one review can only be about one tour. 75 00:03:05,260 --> 00:03:06,093 Right? 76 00:03:06,093 --> 00:03:07,810 So that's the way it makes sense. 77 00:03:07,810 --> 00:03:11,180 And, so, we're actually gonna model it in the exact same way 78 00:03:11,180 --> 00:03:13,380 as the user-reviews relationship. 79 00:03:13,380 --> 00:03:15,460 So, again, parent referencing, 80 00:03:15,460 --> 00:03:17,670 so that in the end the reviews end up 81 00:03:17,670 --> 00:03:20,530 with a tour ID and a user ID. 82 00:03:20,530 --> 00:03:23,270 And, so, then once we query for reviews, 83 00:03:23,270 --> 00:03:25,040 we always know exactly. 84 00:03:25,040 --> 00:03:27,930 Great, so let's now talk about the relationship 85 00:03:27,930 --> 00:03:30,800 between tours and locations. 86 00:03:30,800 --> 00:03:32,230 So as I mentioned earlier, 87 00:03:32,230 --> 00:03:35,230 each tour is gonna have a couple of locations. 88 00:03:35,230 --> 00:03:38,680 So for example, the park camper will basically stop 89 00:03:38,680 --> 00:03:41,080 in like three or four national parks. 90 00:03:41,080 --> 00:03:43,150 And, so, each of these national parks 91 00:03:43,150 --> 00:03:45,120 is gonna be one location. 92 00:03:45,120 --> 00:03:45,953 Right? 93 00:03:45,953 --> 00:03:49,700 And, so, each tour will basically have a few locations. 94 00:03:49,700 --> 00:03:52,730 Now, following that example, one of these national parks 95 00:03:52,730 --> 00:03:55,930 might also be part of one of the other tours. 96 00:03:55,930 --> 00:03:58,260 And, so, basically this relationship here 97 00:03:58,260 --> 00:04:00,770 is a few-to-few relationship. 98 00:04:00,770 --> 00:04:03,630 And we called this relationship many-to-many before 99 00:04:03,630 --> 00:04:06,480 but we still can also call them few-to-few 100 00:04:06,480 --> 00:04:08,910 or a ton to a ton. 101 00:04:08,910 --> 00:04:10,850 And, so, I called them few-to-few 102 00:04:10,850 --> 00:04:15,290 because each tour is only gonna have three, four locations 103 00:04:15,290 --> 00:04:17,460 but not really like 100. 104 00:04:17,460 --> 00:04:18,370 Okay? 105 00:04:18,370 --> 00:04:21,540 And, again, each of the locations can also be part 106 00:04:21,540 --> 00:04:23,060 of another tour. 107 00:04:23,060 --> 00:04:26,210 Now, this could be a good example for actually implementing 108 00:04:26,210 --> 00:04:30,670 two-way referencing, so basically normalizing the locations 109 00:04:30,670 --> 00:04:32,480 into its own dataset. 110 00:04:32,480 --> 00:04:33,313 Right? 111 00:04:33,313 --> 00:04:36,330 But instead I'm actually gonna denormalize the locations 112 00:04:36,330 --> 00:04:39,270 so to embed them into the tours. 113 00:04:39,270 --> 00:04:41,350 And that's actually for multiple reasons. 114 00:04:41,350 --> 00:04:44,500 First, because there only so few locations. 115 00:04:44,500 --> 00:04:47,400 Also, we will not really gonna access the locations 116 00:04:47,400 --> 00:04:48,690 on their own. 117 00:04:48,690 --> 00:04:51,890 And, finally, these locations are intrinsically related 118 00:04:51,890 --> 00:04:55,400 to the tours because really without locations 119 00:04:55,400 --> 00:04:57,280 there couldn't be any tours. 120 00:04:57,280 --> 00:04:58,113 Right? 121 00:04:58,113 --> 00:05:00,480 So these datasets belong closely together. 122 00:05:00,480 --> 00:05:04,030 And, so, I chose to embed locations into tours 123 00:05:04,030 --> 00:05:06,580 and not create yet another collection for these. 124 00:05:06,580 --> 00:05:07,413 Right? 125 00:05:07,413 --> 00:05:10,750 So we will have one collection for tours, one for users, 126 00:05:10,750 --> 00:05:13,330 and a bit later we will also create a new collection 127 00:05:13,330 --> 00:05:14,710 for the reviews. 128 00:05:14,710 --> 00:05:15,543 All right? 129 00:05:15,543 --> 00:05:18,860 But for locations, again, because these will be embedded 130 00:05:18,860 --> 00:05:19,793 into the tours. 131 00:05:20,640 --> 00:05:23,710 Okay, and next up there's also a relationship 132 00:05:23,710 --> 00:05:26,250 between the tours and the users. 133 00:05:26,250 --> 00:05:28,780 And that's because we're gonna have tour guides 134 00:05:28,780 --> 00:05:33,150 in the tours, and these tour guides will actually be users. 135 00:05:33,150 --> 00:05:36,270 So remember how we actually gave users a role 136 00:05:36,270 --> 00:05:37,760 in our Mongoose schema? 137 00:05:37,760 --> 00:05:40,770 And the possibilities there contained the guide 138 00:05:40,770 --> 00:05:43,020 and lead guide, remember? 139 00:05:43,020 --> 00:05:44,670 And, so, there's gonna be a relationship 140 00:05:44,670 --> 00:05:48,210 between these types of users and the tours. 141 00:05:48,210 --> 00:05:52,240 Now, this relationship is again a few-to-few relationship 142 00:05:52,240 --> 00:05:55,550 because one tour can have only a few users, 143 00:05:55,550 --> 00:05:58,410 so a few tour guides, but at the same time, 144 00:05:58,410 --> 00:06:02,150 each tour guide can also be guiding a few tours. 145 00:06:02,150 --> 00:06:02,983 All right? 146 00:06:02,983 --> 00:06:06,490 And, so, again, there's a many-to-many relationship here, 147 00:06:06,490 --> 00:06:09,270 which I simply called here few-to-few. 148 00:06:09,270 --> 00:06:12,140 Now, about actually modeling this relationship, 149 00:06:12,140 --> 00:06:14,410 we could do it in two ways. 150 00:06:14,410 --> 00:06:17,280 We could use referencing or embedding. 151 00:06:17,280 --> 00:06:19,620 And actually I'm gonna show you how to implement 152 00:06:19,620 --> 00:06:22,830 both child referencing embedding using Mongoose 153 00:06:22,830 --> 00:06:24,410 throughout this section. 154 00:06:24,410 --> 00:06:25,620 Okay? 155 00:06:25,620 --> 00:06:28,800 And the argument for embedding is that in this case 156 00:06:28,800 --> 00:06:31,930 we could then have all the information about each tour 157 00:06:31,930 --> 00:06:34,310 containing the information about tour guides 158 00:06:34,310 --> 00:06:36,700 right on each tour document. 159 00:06:36,700 --> 00:06:38,710 But on the other hand, that would then create 160 00:06:38,710 --> 00:06:41,120 some extra information in the database 161 00:06:41,120 --> 00:06:43,670 because we will still need to have the users 162 00:06:43,670 --> 00:06:45,210 as a separate collection 163 00:06:45,210 --> 00:06:48,700 simply because we need to access them all the time 164 00:06:48,700 --> 00:06:51,250 for user authentication and authorization 165 00:06:51,250 --> 00:06:52,510 and all that stuff. 166 00:06:52,510 --> 00:06:56,290 So usually, users are always an entity on their own 167 00:06:56,290 --> 00:06:57,700 in each database. 168 00:06:57,700 --> 00:06:58,533 Okay? 169 00:06:58,533 --> 00:07:02,380 But we could still embed some of the users into the tours. 170 00:07:02,380 --> 00:07:04,750 So basically when the user is a tour guide 171 00:07:04,750 --> 00:07:08,190 for a specific tour, we could then copy all this data 172 00:07:08,190 --> 00:07:09,950 into the tour document. 173 00:07:09,950 --> 00:07:10,783 Okay? 174 00:07:10,783 --> 00:07:14,230 But also we would then have to update the user on the tour 175 00:07:14,230 --> 00:07:17,590 each time that the underlying user itself changes. 176 00:07:17,590 --> 00:07:19,710 So let's say that the role of a user changes 177 00:07:19,710 --> 00:07:21,690 from guide to a lead guide. 178 00:07:21,690 --> 00:07:24,410 And in that case, we would then have to go to the tour 179 00:07:24,410 --> 00:07:26,850 and also update that role information 180 00:07:26,850 --> 00:07:28,840 right there on the embedded data. 181 00:07:28,840 --> 00:07:29,673 Okay? 182 00:07:29,673 --> 00:07:32,320 And, so, that's not ideal, and so we're actually 183 00:07:32,320 --> 00:07:35,350 also gonna then implement child referencing. 184 00:07:35,350 --> 00:07:37,280 And, so, with that, we can still keep 185 00:07:37,280 --> 00:07:39,590 basically the information about the tour guides 186 00:07:39,590 --> 00:07:42,860 on the users but simply in a referenced form, 187 00:07:42,860 --> 00:07:44,930 so basically keeping the IDs there, 188 00:07:44,930 --> 00:07:47,630 which are then gonna point to the users. 189 00:07:47,630 --> 00:07:48,463 Okay? 190 00:07:48,463 --> 00:07:51,370 And of course we could also use two-way referencing, 191 00:07:51,370 --> 00:07:55,100 so also keeping an ID of the tour right on the user. 192 00:07:55,100 --> 00:07:56,650 But I think that's a bit too much 193 00:07:56,650 --> 00:07:59,140 for this kind of small example 194 00:07:59,140 --> 00:08:02,850 because not all users will actually need an ID of the tour 195 00:08:02,850 --> 00:08:05,580 because not all users are tour guides. 196 00:08:05,580 --> 00:08:08,870 And, so, this relationship here is a bit tricky to model, 197 00:08:08,870 --> 00:08:10,800 I think, but I believe that in the end 198 00:08:10,800 --> 00:08:14,200 child referencing is gonna be the best way to go. 199 00:08:14,200 --> 00:08:15,033 Okay? 200 00:08:15,033 --> 00:08:17,220 But still, I'm gonna also show you embedding 201 00:08:17,220 --> 00:08:20,120 because I think that's also important to learn. 202 00:08:20,120 --> 00:08:21,400 All right? 203 00:08:21,400 --> 00:08:23,530 Next up, we have our bookings. 204 00:08:23,530 --> 00:08:26,130 And basically a new booking will be created 205 00:08:26,130 --> 00:08:29,340 each time that a user purchases a tour. 206 00:08:29,340 --> 00:08:31,340 So this is still kind of a relationship 207 00:08:31,340 --> 00:08:33,240 between users and tours 208 00:08:33,240 --> 00:08:36,950 because again it's a user who is gonna buy a tour. 209 00:08:36,950 --> 00:08:38,810 But we also want to store some data 210 00:08:38,810 --> 00:08:40,920 about that relationship itself, 211 00:08:40,920 --> 00:08:44,450 so in this case about the purchase itself in our database. 212 00:08:44,450 --> 00:08:46,430 For example, the price or the date 213 00:08:46,430 --> 00:08:49,560 when the purchase happened or something like that. 214 00:08:49,560 --> 00:08:50,810 And, so, in cases like this, 215 00:08:50,810 --> 00:08:53,750 it's a good idea to create an extra dataset, 216 00:08:53,750 --> 00:08:55,920 which in this case is the bookings. 217 00:08:55,920 --> 00:08:56,753 Okay? 218 00:08:56,753 --> 00:08:58,710 And, so, of course there will be a relationship 219 00:08:58,710 --> 00:09:02,398 between tours and bookings and also users and bookings. 220 00:09:02,398 --> 00:09:06,150 And, again, because basically the booking connects tours 221 00:09:06,150 --> 00:09:09,763 with users but kind of with an intermediate step. 222 00:09:09,763 --> 00:09:12,530 So one tour can have many bookings, 223 00:09:12,530 --> 00:09:15,760 but one booking can only belong to one tour. 224 00:09:15,760 --> 00:09:17,350 And the same thing with users. 225 00:09:17,350 --> 00:09:19,870 So one user can book many tours, 226 00:09:19,870 --> 00:09:23,610 but one booking can only belong to one of the users. 227 00:09:23,610 --> 00:09:26,380 And, so, of course we have a one-to-many relationship 228 00:09:26,380 --> 00:09:29,080 in both cases, and also in both cases, 229 00:09:29,080 --> 00:09:31,140 we're gonna use parent referencing. 230 00:09:31,140 --> 00:09:33,610 And, so, that means that on each booking 231 00:09:33,610 --> 00:09:37,640 we're gonna keep an ID of both the tour that was purchased 232 00:09:37,640 --> 00:09:40,270 and also of the user who actually purchased the tour. 233 00:09:40,270 --> 00:09:41,103 Okay? 234 00:09:41,103 --> 00:09:42,930 And, so, in this case, I'm doing it this way 235 00:09:42,930 --> 00:09:46,140 because basically I don't want to pollute the tour data 236 00:09:46,140 --> 00:09:49,510 with information about who actually bought the tour. 237 00:09:49,510 --> 00:09:50,343 Right? 238 00:09:50,343 --> 00:09:53,157 It wouldn't be really relevant to the tour data itself. 239 00:09:53,157 --> 00:09:55,070 And the same thing with users. 240 00:09:55,070 --> 00:09:58,370 So we also don't want to pollute the users object 241 00:09:58,370 --> 00:10:00,740 with all of the bookings that they did. 242 00:10:00,740 --> 00:10:01,573 All right? 243 00:10:01,573 --> 00:10:03,000 And, so, instead again 244 00:10:03,000 --> 00:10:05,770 we're gonna create an intermediate object 245 00:10:05,770 --> 00:10:08,450 or an intermediate dataset that's going to stand 246 00:10:08,450 --> 00:10:12,520 between users and tours whenever they create a new purchase. 247 00:10:12,520 --> 00:10:13,353 Right? 248 00:10:13,353 --> 00:10:14,590 Make sense? 249 00:10:14,590 --> 00:10:17,520 And that's actually it for our data model. 250 00:10:17,520 --> 00:10:21,370 And of course, this now looks kind of abstract, 251 00:10:21,370 --> 00:10:23,150 but once we start implementing it, 252 00:10:23,150 --> 00:10:24,660 it's gonna be very helpful 253 00:10:24,660 --> 00:10:28,730 to have all of our ideas organized into something like this. 254 00:10:28,730 --> 00:10:31,310 So whenever this data model that we're gonna implement 255 00:10:31,310 --> 00:10:34,560 throughout this section is becoming a bit confusing to you, 256 00:10:34,560 --> 00:10:36,970 then just reference back to this slide. 257 00:10:36,970 --> 00:10:39,080 Or you can maybe even print it out 258 00:10:39,080 --> 00:10:40,980 if that makes it easier for you. 259 00:10:40,980 --> 00:10:43,960 So this is our data model in theory. 260 00:10:43,960 --> 00:10:46,080 And now throughout the rest of the course, 261 00:10:46,080 --> 00:10:48,870 I will give you the tools to actually model the data 262 00:10:48,870 --> 00:10:50,543 using the Mongoose Library.