1 00:00:02,140 --> 00:00:09,760 Now with capped collections out of the way, let's turn towards replica sets. Replica sets are something you 2 00:00:09,760 --> 00:00:14,770 would create and manage as a database or system administrator 3 00:00:15,010 --> 00:00:17,260 but what are replica sets? 4 00:00:17,350 --> 00:00:24,350 Let's say we have our client, either the mongo shell we're using or some native driver for Node, PHP, C++, 5 00:00:24,340 --> 00:00:30,760 whatever it is. Now we want to write some data to our database and hence we send our write, 6 00:00:31,000 --> 00:00:38,170 our insert operation to the mongodb server which in the end talks to the primary node you could 7 00:00:38,170 --> 00:00:44,040 say. Now important is a node here simply is a mongodb server, 8 00:00:44,050 --> 00:00:50,210 so what we use thus far with this mongod command gave us a node, the only node we had. 9 00:00:50,500 --> 00:00:55,810 So the mongodb server is technically attached to that node but it's a bit easier to understand it like 10 00:00:55,810 --> 00:00:59,720 this, I believe. So we have that primary node and that is basically the setup we used for this entire 11 00:00:59,730 --> 00:01:03,340 course, we have our server which is one node. 12 00:01:03,760 --> 00:01:07,360 Now you can add more nodes, so-called secondary nodes, 13 00:01:07,450 --> 00:01:10,140 so these are additional database servers 14 00:01:10,150 --> 00:01:11,420 you could say you start up 15 00:01:11,440 --> 00:01:16,320 which are all tied together though in a so-called replica set. 16 00:01:16,330 --> 00:01:21,060 Now the idea here is that you always communicate with your primary node automatically, 17 00:01:21,070 --> 00:01:25,840 you don't need to do that manually, this happens automatically, if you send an insert command to your 18 00:01:25,870 --> 00:01:27,450 connected mongo server, 19 00:01:27,460 --> 00:01:34,240 it will automatically talk to the primary node but behind the scenes, the primary node will asynchronously 20 00:01:34,330 --> 00:01:40,570 replicate the data on the secondary nodes and asynchronously simply means that if you insert data, it's 21 00:01:40,570 --> 00:01:45,240 not immediately written to the secondary nodes but relatively soon. 22 00:01:45,520 --> 00:01:48,370 So you have this replication of data, 23 00:01:48,760 --> 00:01:50,730 now why do we replicate data? 24 00:01:50,950 --> 00:01:52,490 Well we do replicate data 25 00:01:52,600 --> 00:01:54,380 so that in this set up here, 26 00:01:54,550 --> 00:02:02,170 that if we read data and for some reason, our primary node should be offline, that we can reach out to 27 00:02:02,170 --> 00:02:10,210 a secondary node that will be then the elected new primary node, the secondary nodes in a replica set hold 28 00:02:10,300 --> 00:02:17,100 a so-called election when the primary node goes down to elect and select a new primary node 29 00:02:17,320 --> 00:02:23,030 and then we talk to that new primary node until our entire replica set is restored. 30 00:02:23,140 --> 00:02:31,120 So we get some fault tolerance in here because even if one of our servers you could say goes down, we can 31 00:02:31,120 --> 00:02:38,980 talk to another instance another node in that server network, in that cluster so to say to still read data 32 00:02:39,190 --> 00:02:44,780 and as a new primary, we can then also not just to read but also write data. 33 00:02:44,800 --> 00:02:51,940 So this is why we use replica sets, we have the back up and fault tolerance and we get better read performance 34 00:02:51,940 --> 00:02:52,560 as well and 35 00:02:52,600 --> 00:02:54,420 that is something I haven't talked about yet, 36 00:02:54,610 --> 00:03:00,920 we talked about the backup of data and therefore, the possibility to read data and also then to write 37 00:03:00,970 --> 00:03:02,150 to a new primary, 38 00:03:02,350 --> 00:03:09,160 well if we have this set up, this is of course fine if the primary node is online but even if it does 39 00:03:09,160 --> 00:03:17,560 not go offline, you can configure everything such that your backend will automatically distribute incoming 40 00:03:17,560 --> 00:03:22,810 read requests across all nodes and now we're talking just about read requests. 41 00:03:22,900 --> 00:03:30,250 The writes will always go to the primary node but read requests can be if the server is configured appropriately 42 00:03:30,340 --> 00:03:36,100 and that is a task of your system or database admin and that the reads can also talk to secondary nodes 43 00:03:36,280 --> 00:03:42,810 and the idea here is of course clear. You want to ensure that you can read your data as fast as possible 44 00:03:43,090 --> 00:03:50,200 and if you have an application where you have thousands of read requests per second, then it is awesome 45 00:03:50,200 --> 00:03:55,780 if you can read not just from one node which is still one computer who has to handle all of that but if you 46 00:03:55,780 --> 00:04:00,750 can read from multiple computers and therefore, you kind of split the load evenly. 47 00:04:01,000 --> 00:04:03,200 So that's the idea behind replica sets, 48 00:04:03,220 --> 00:04:05,530 we get the backup, the fault tolerance 49 00:04:05,650 --> 00:04:12,200 and we can even use the nodes and the replica set for improved read performance. 50 00:04:12,200 --> 00:04:15,340 Now how do you create such a replica set? Again 51 00:04:15,340 --> 00:04:20,820 this is an administrative task, we'll not go through that in this course since it's well a bit more advanced, 52 00:04:20,830 --> 00:04:25,420 not really something you'll have to worry about as a developer but when we deploy a mongodb 53 00:04:25,420 --> 00:04:26,580 solution. 54 00:04:26,650 --> 00:04:31,510 I will also show you a way of easily getting such a replica set up and running, 55 00:04:31,510 --> 00:04:33,490 so there I will cover that too.