1
00:00:00,830 --> 00:00:05,570
We just saw some scenarios in which this whole event based communication architecture seems to totally

2
00:00:05,570 --> 00:00:06,730
fall apart.

3
00:00:06,950 --> 00:00:10,850
In this video I'm going to answer a couple of questions that you might have about all this stuff right

4
00:00:10,850 --> 00:00:11,910
from the get go.

5
00:00:11,930 --> 00:00:15,800
Well then take a pause and then we'll go through some possible ways of solving these problems in the

6
00:00:15,800 --> 00:00:16,720
video after this one.

7
00:00:16,960 --> 00:00:19,220
It's going to try to keep this video a little bit shorter.

8
00:00:19,250 --> 00:00:19,510
OK.

9
00:00:19,520 --> 00:00:22,020
So first big question you might have.

10
00:00:22,100 --> 00:00:26,230
Remember we're making use of async communication between our different services.

11
00:00:26,250 --> 00:00:28,760
That's why we're dealing with these event things.

12
00:00:28,760 --> 00:00:33,230
We had said earlier on INSIDE the course that when we are working micro services we can have them communicate

13
00:00:33,290 --> 00:00:37,840
asynchronously at events or synchronous sleep with direct requests.

14
00:00:37,850 --> 00:00:42,290
So at this point it might sound like this async communication style is just awful.

15
00:00:42,290 --> 00:00:46,340
Based upon all the problems I just mentioned and you might think that it would be a lot easier if we

16
00:00:46,340 --> 00:00:49,220
just stuck to some kind of synchronous communication approach.

17
00:00:49,220 --> 00:00:54,110
Well turns out all the same stuff happens with synchronous communications.

18
00:00:54,110 --> 00:00:57,330
So at that point you might say well the heck with this micros services stuff.

19
00:00:57,350 --> 00:00:59,930
Let's just go back to building monolith style apps.

20
00:00:59,930 --> 00:01:04,540
Well it turns out all this stuff actually does happen with monolithic style applications too.

21
00:01:04,550 --> 00:01:07,820
So let me show you diagram just to expand on this first point.

22
00:01:07,840 --> 00:01:08,110
All right.

23
00:01:08,110 --> 00:01:12,850
So we're going to imagine that we are building the same kind of money depositing application but now

24
00:01:12,910 --> 00:01:18,690
we have a more monolith style approach where we are not exchanging events or anything like that.

25
00:01:18,730 --> 00:01:22,420
Now in this scenario we would imagine that maybe a user makes three requests in a row.

26
00:01:22,420 --> 00:01:23,890
One request you deposit 70.

27
00:01:23,890 --> 00:01:30,030
Deposit 40 and then withdraw 100 now even if we are using some kind of monolith style application it

28
00:01:30,030 --> 00:01:34,640
is still incredibly likely that we are running multiple instances or copies of that application.

29
00:01:35,100 --> 00:01:37,830
So never a user makes these three requests right here.

30
00:01:37,830 --> 00:01:42,840
They will probably go to a load balancer of sorts and then these requests will essentially be randomly

31
00:01:42,840 --> 00:01:45,130
assigned to all of these different instances.

32
00:01:45,240 --> 00:01:48,410
So we can imagine that their first request goes there.

33
00:01:48,420 --> 00:01:56,130
The second to right here and the third one down here so then each of these instances are going to raise

34
00:01:56,130 --> 00:01:59,000
essentially and try to process that incoming request.

35
00:01:59,130 --> 00:02:04,680
So we might be in that same kind of scenario where maybe instance a an instance B have a lot of traffic

36
00:02:04,680 --> 00:02:09,420
incoming right now for whatever crazy reason maybe they are a provision to some virtual machine that

37
00:02:09,420 --> 00:02:15,990
has lower specs than instance see down here or whatever reason we can very easily imagine that monoliths

38
00:02:15,990 --> 00:02:20,860
C down here or instant C might process this withdraw one hundred dollars request first.

39
00:02:20,910 --> 00:02:22,620
So it would reach into the database.

40
00:02:22,620 --> 00:02:25,260
Take a look at the user's balance Oh it's zero right now.

41
00:02:25,260 --> 00:02:27,480
Well once again we're in some huge error.

42
00:02:28,110 --> 00:02:32,580
So by just going back to some model modeling style approach we still deal with these concurrency issues.

43
00:02:32,760 --> 00:02:38,670
It just turns out that when we start using this micros services event based approach these same concurrency

44
00:02:38,670 --> 00:02:43,920
issues just become a little bit more prominent because we're now talking about adding in this extra

45
00:02:43,980 --> 00:02:46,520
latency step of the Nats server.

46
00:02:46,520 --> 00:02:51,030
And we're also talking about the possibility of having automatic retrials or read deliveries of these

47
00:02:51,030 --> 00:02:52,230
different events.

48
00:02:52,230 --> 00:02:56,790
So because the system becomes more complex because there are additional communication jumps inside of

49
00:02:56,790 --> 00:03:00,750
here this whole concurrency issue just becomes a little bit more prominent.

50
00:03:00,750 --> 00:03:02,390
Well to be honest a lot more prominent.

51
00:03:02,550 --> 00:03:08,090
But it's still an issue even if we went back to an old style approach again.

52
00:03:08,110 --> 00:03:11,370
So next up you might come up with an immediate solution.

53
00:03:11,380 --> 00:03:16,390
If I asked you Hey solve this problem here's one possible solution you might decide to come up with

54
00:03:17,450 --> 00:03:21,680
this a very common solution or a solution people come up with and then they very quickly realize oh

55
00:03:21,680 --> 00:03:23,600
wait that won't quite work out.

56
00:03:23,630 --> 00:03:27,470
So when we were going through all these different scenarios that everything would fail it seemed like

57
00:03:27,500 --> 00:03:33,200
a lot of the issues kind of stemmed from the fact that we had two separate services processing events

58
00:03:33,620 --> 00:03:39,260
because now there was a scenario where maybe one service was slower than the other or had communication

59
00:03:39,260 --> 00:03:41,920
issues with file storage or whatever else.

60
00:03:41,930 --> 00:03:48,230
So very convolution people come up with is let's just run one copy of the service one instance think

61
00:03:48,230 --> 00:03:50,070
about what would happen if we did that.

62
00:03:50,150 --> 00:03:55,400
So we would say let's throw that service out now we're just running one copy of the account service

63
00:03:56,530 --> 00:04:00,000
now as we start to publish events deposit deposit withdrawal.

64
00:04:00,210 --> 00:04:07,370
They'll go to hear what process that process that process that now even then there's still a possibility

65
00:04:07,370 --> 00:04:08,300
of failure here.

66
00:04:08,330 --> 00:04:13,040
For example this first event we could have some issue opening up that file because of some temporary

67
00:04:13,040 --> 00:04:17,500
issue with the hard drive so we might end up having some issue processing this first event.

68
00:04:17,600 --> 00:04:22,370
We might fail to process it entirely and it gets thrown back over essentially more or less.

69
00:04:22,370 --> 00:04:25,150
It doesn't actually get thrown back over but we never acknowledge it.

70
00:04:25,160 --> 00:04:31,160
And so Nats figures that has to deliver it again if we might then successfully process 40 dollars and

71
00:04:31,160 --> 00:04:34,670
then withdraw one hundred and boom we're still back to the same issue as before.

72
00:04:34,850 --> 00:04:38,890
But this is not foolproof but at least it kind of solves in the issues we ran into.

73
00:04:38,930 --> 00:04:43,160
And we are running multiple instances but that's not the real big issue here where we still have the

74
00:04:43,160 --> 00:04:45,020
possibility of failure.

75
00:04:45,120 --> 00:04:50,100
The big issue is that if we are only going to run one copy of the service now all of a sudden we've

76
00:04:50,100 --> 00:04:53,210
got a processing bottleneck inside of our application.

77
00:04:53,440 --> 00:04:59,100
If we can only run one copy of the service well we are very severely constrained at how quickly our

78
00:04:59,100 --> 00:05:02,230
application can process data and how we can scale it up.

79
00:05:02,240 --> 00:05:05,190
Remember generally two ways to scale up an application.

80
00:05:05,190 --> 00:05:09,960
We can scale it vertically where we are going to increase the specs that are dedicated or the amount

81
00:05:09,960 --> 00:05:13,650
of you and processing power that the service gets or horizontally.

82
00:05:13,650 --> 00:05:16,430
That's where we create more copies of this service.

83
00:05:16,470 --> 00:05:20,130
So we're going to say right from the outset they're only going to run one copy of the account service

84
00:05:20,460 --> 00:05:20,960
all the sudden.

85
00:05:20,970 --> 00:05:26,610
We cannot scale horizontally and I can end up being a huge catastrophic issue down the line as our app

86
00:05:26,610 --> 00:05:32,030
starts to get more popular and more traffic going to say solution one it's not going to work.

87
00:05:32,040 --> 00:05:33,090
It's not an option.

88
00:05:33,090 --> 00:05:39,630
We always have to assume that we're going to be running multiple copies of any given service.

89
00:05:39,700 --> 00:05:42,160
Option number two or possible solution number two.

90
00:05:42,160 --> 00:05:43,760
That is also not going to work.

91
00:05:43,930 --> 00:05:46,270
Now I don't really have a distinct plan here.

92
00:05:46,270 --> 00:05:48,910
This is more just a note that I want to throw out right now.

93
00:05:48,910 --> 00:05:54,700
Very quickly so possible solution number two you might try to figure out every possible concurrency

94
00:05:54,700 --> 00:05:58,900
issue inside of your app and you might decide that Hey we're gonna find all these possible issues and

95
00:05:58,900 --> 00:06:02,430
we're going to write code to solve every single last one.

96
00:06:02,440 --> 00:06:04,810
Well I just want you to get it in your head right now.

97
00:06:04,810 --> 00:06:08,630
I really want you to internalize this inside of any application.

98
00:06:08,640 --> 00:06:12,060
There is an possibly infinite number of concurrency issues.

99
00:06:12,220 --> 00:06:18,310
There can be a lot of different things that could possibly go wrong and you really feasibly can not

100
00:06:18,310 --> 00:06:21,510
sit down and write code to handle every last issue.

101
00:06:21,520 --> 00:06:23,380
Now of course there are exceptions to this.

102
00:06:23,410 --> 00:06:27,790
If you are building some kind of spaceship or something like that something that is absolutely critical

103
00:06:27,790 --> 00:06:30,070
in nature that it always works no matter what.

104
00:06:30,070 --> 00:06:31,990
Of course there are exceptions.

105
00:06:32,140 --> 00:06:37,420
But if you are building some kind of like Twitter clone or something like that does it really matter

106
00:06:37,510 --> 00:06:39,890
if say two tweets are out of order.

107
00:06:39,970 --> 00:06:40,540
Does it matter.

108
00:06:40,540 --> 00:06:46,070
Two tweets are duplicated or forum posts or blog posts.

109
00:06:46,100 --> 00:06:48,530
Does that kind of stuff really matter at the end the day.

110
00:06:48,650 --> 00:06:53,390
And are you going to dedicate a huge amount of engineering time and money to solve that problem.

111
00:06:53,390 --> 00:06:56,560
Well I can tell you right now you're probably not.

112
00:06:56,570 --> 00:07:01,130
So there is a certain point where you might start to identify possible issues and you say you know what.

113
00:07:01,130 --> 00:07:03,940
Realistically that is just not likely to happen.

114
00:07:03,950 --> 00:07:08,810
You know it's not likely for a user to try to create five tweets at the same time while also deleting

115
00:07:08,810 --> 00:07:12,090
two of them and editing the other three or something like that.

116
00:07:12,140 --> 00:07:14,510
So you really got to sit down and make some engineering judgment.

117
00:07:14,510 --> 00:07:16,440
Is this actually worth trying to solve.

118
00:07:16,520 --> 00:07:22,160
Because a lot of the time you're probably going to inevitably say no it's not so this isn't really a

119
00:07:22,160 --> 00:07:24,540
solution for say I just want to throw this out there right away.

120
00:07:24,560 --> 00:07:29,480
Because even in the app that we are going to build you might start to say hey Stephen wait a minute

121
00:07:29,510 --> 00:07:34,190
in our ticketing application if a user creates a ticket and then does this and then this and then this

122
00:07:34,610 --> 00:07:40,130
well they can end up an issue if the timing is just right and some service fails at the same time.

123
00:07:40,220 --> 00:07:40,560
OK.

124
00:07:40,580 --> 00:07:43,020
I accept that there might be holes there might be gaps.

125
00:07:43,070 --> 00:07:48,440
But again at the end of the day we probably cannot write code to capture every single case because number

126
00:07:48,440 --> 00:07:51,050
one it might not just might not matter.

127
00:07:51,050 --> 00:07:55,820
Number two it might just take too much engineering time to fix OK.

128
00:07:55,850 --> 00:07:57,720
So again Kim's short video here.

129
00:07:58,020 --> 00:07:58,320
OK.

130
00:07:58,370 --> 00:08:00,470
Was eight minutes I still failed.

131
00:08:00,470 --> 00:08:01,560
Well any right.

132
00:08:01,670 --> 00:08:02,510
Let's pause right here.

133
00:08:02,510 --> 00:08:07,280
We're gonna come back the next video and take a look at some possible strategies to solve these big

134
00:08:07,280 --> 00:08:07,790
issues.