1
00:00:02,140 --> 00:00:09,760
Now with capped collections out of the way, let's turn towards replica sets. Replica sets are something you

2
00:00:09,760 --> 00:00:14,770
would create and manage as a database or system administrator

3
00:00:15,010 --> 00:00:17,260
but what are replica sets?

4
00:00:17,350 --> 00:00:24,350
Let's say we have our client, either the mongo shell we're using or some native driver for Node, PHP, C++,

5
00:00:24,340 --> 00:00:30,760
whatever it is. Now we want to write some data to our database and hence we send our write,

6
00:00:31,000 --> 00:00:38,170
our insert operation to the mongodb server which in the end talks to the primary node you could

7
00:00:38,170 --> 00:00:44,040
say. Now important is a node here simply is a mongodb server,

8
00:00:44,050 --> 00:00:50,210
so what we use thus far with this mongod command gave us a node, the only node we had.

9
00:00:50,500 --> 00:00:55,810
So the mongodb server is technically attached to that node but it's a bit easier to understand it like

10
00:00:55,810 --> 00:00:59,720
this, I believe. So we have that primary node and that is basically the setup we used for this entire

11
00:00:59,730 --> 00:01:03,340
course, we have our server which is one node.

12
00:01:03,760 --> 00:01:07,360
Now you can add more nodes, so-called secondary nodes,

13
00:01:07,450 --> 00:01:10,140
so these are additional database servers

14
00:01:10,150 --> 00:01:11,420
you could say you start up

15
00:01:11,440 --> 00:01:16,320
which are all tied together though in a so-called replica set.

16
00:01:16,330 --> 00:01:21,060
Now the idea here is that you always communicate with your primary node automatically,

17
00:01:21,070 --> 00:01:25,840
you don't need to do that manually, this happens automatically, if you send an insert command to your

18
00:01:25,870 --> 00:01:27,450
connected mongo server,

19
00:01:27,460 --> 00:01:34,240
it will automatically talk to the primary node but behind the scenes, the primary node will asynchronously

20
00:01:34,330 --> 00:01:40,570
replicate the data on the secondary nodes and asynchronously simply means that if you insert data, it's

21
00:01:40,570 --> 00:01:45,240
not immediately written to the secondary nodes but relatively soon.

22
00:01:45,520 --> 00:01:48,370
So you have this replication of data,

23
00:01:48,760 --> 00:01:50,730
now why do we replicate data?

24
00:01:50,950 --> 00:01:52,490
Well we do replicate data

25
00:01:52,600 --> 00:01:54,380
so that in this set up here,

26
00:01:54,550 --> 00:02:02,170
that if we read data and for some reason, our primary node should be offline, that we can reach out to

27
00:02:02,170 --> 00:02:10,210
a secondary node that will be then the elected new primary node, the secondary nodes in a replica set hold

28
00:02:10,300 --> 00:02:17,100
a so-called election when the primary node goes down to elect and select a new primary node

29
00:02:17,320 --> 00:02:23,030
and then we talk to that new primary node until our entire replica set is restored.

30
00:02:23,140 --> 00:02:31,120
So we get some fault tolerance in here because even if one of our servers you could say goes down, we can

31
00:02:31,120 --> 00:02:38,980
talk to another instance another node in that server network, in that cluster so to say to still read data

32
00:02:39,190 --> 00:02:44,780
and as a new primary, we can then also not just to read but also write data.

33
00:02:44,800 --> 00:02:51,940
So this is why we use replica sets, we have the back up and fault tolerance and we get better read performance

34
00:02:51,940 --> 00:02:52,560
as well and

35
00:02:52,600 --> 00:02:54,420
that is something I haven't talked about yet,

36
00:02:54,610 --> 00:03:00,920
we talked about the backup of data and therefore, the possibility to read data and also then to write

37
00:03:00,970 --> 00:03:02,150
to a new primary,

38
00:03:02,350 --> 00:03:09,160
well if we have this set up, this is of course fine if the primary node is online but even if it does

39
00:03:09,160 --> 00:03:17,560
not go offline, you can configure everything such that your backend will automatically distribute incoming

40
00:03:17,560 --> 00:03:22,810
read requests across all nodes and now we're talking just about read requests.

41
00:03:22,900 --> 00:03:30,250
The writes will always go to the primary node but read requests can be if the server is configured appropriately

42
00:03:30,340 --> 00:03:36,100
and that is a task of your system or database admin and that the reads can also talk to secondary nodes

43
00:03:36,280 --> 00:03:42,810
and the idea here is of course clear. You want to ensure that you can read your data as fast as possible

44
00:03:43,090 --> 00:03:50,200
and if you have an application where you have thousands of read requests per second, then it is awesome

45
00:03:50,200 --> 00:03:55,780
if you can read not just from one node which is still one computer who has to handle all of that but if you

46
00:03:55,780 --> 00:04:00,750
can read from multiple computers and therefore, you kind of split the load evenly.

47
00:04:01,000 --> 00:04:03,200
So that's the idea behind replica sets,

48
00:04:03,220 --> 00:04:05,530
we get the backup, the fault tolerance

49
00:04:05,650 --> 00:04:12,200
and we can even use the nodes and the replica set for improved read performance.

50
00:04:12,200 --> 00:04:15,340
Now how do you create such a replica set? Again

51
00:04:15,340 --> 00:04:20,820
this is an administrative task, we'll not go through that in this course since it's well a bit more advanced,

52
00:04:20,830 --> 00:04:25,420
not really something you'll have to worry about as a developer but when we deploy a mongodb

53
00:04:25,420 --> 00:04:26,580
solution.

54
00:04:26,650 --> 00:04:31,510
I will also show you a way of easily getting such a replica set up and running,

55
00:04:31,510 --> 00:04:33,490
so there I will cover that too.