Everything Will Flow

0 0

so we're here to talk about cues and you know some of you may have wondered why cues really are pretty simple right you have a producer that in cues messages you have a consumer that DQ's messages typically but not always in the order that they were added and that's about it but something that is maybe non obvious is that in addition to the active in queueing being a side effect right we get back to its confirmation that was added because what we're getting back is not a function of the data the data itself can only describe a side effect right cues our way of dealing with actions and for this reason I think in the closure community it's always been a bit of a sideline right you know Java provides us with cues we use them occasionally we abstract over the more possible but we don't really deal with them head-on or at least not until we had core aiesec which brought it sort of more into the central conversation which i think is a good thing because we of course are in the business of making computers do things right we don't just raise the ambient temperature of the room and so looking under the covers a coursing channel has not one but three cues it has a buffer that holds the messages but it also has a puts cue where producers that are trying to add to a full buffer will wait and I takes cue where consumers that are trying to egg masses from an empty buffer will wait but once we've sort of traverse this once the producer has successfully handed off a message to the consumer that's not the end of the story because the consumers provided a callback wherein it says what it wants to do with a message and that callback has to be run on a thread pool and I thread pool and Java has another cue that's just between the people who want something to be done and the queues I can do it and typically the size of a thread pool is dynamic and what causes a thread pool to grow is trying to add something to the blocking queue and find out of the queue is full at which point a new thread will be spun up unless we have the maximum number of threads whatever you define that to be in which case it will throw a rejected execution exception which is something that we should always be aware of when using a thread pool and the blocking queue under the covers looks a great deal like core async it has a buffer that holds the messages and it has these things called conditions or condition queues that are call not full and not empty respectively unlike a typical queue though it's just a place where threads do park themselves there's no actual conveyance of information and so when a producer comes in and finds a buffer to be full it will release all the locks it will park itself at the not full condition and wait to be notified that the buffer is not full and when it's notified only one of the threads that are waiting there will actually be woken up and the conditions are buried somewhere inside of the JVM implementation I'm not quite sure how they work but these are way for threads that sort of coordinate amongst each other but even those threads on the OS level have other another cue which in which they are so scheduled and even those sort of software threads have to be queued on to the hardware threads or the cores on our machines and so between us and everything we want a computer to do there's probably about half a dozen queues just sort of intermediating that and so queues are ubiquitous they're everywhere and it sort of proves us on some level to understand them and the reason that queues are so ubiquitous is because they separate what we want to have happen from when it happens and this is really valuable because most of the time when we write code we don't know what else is going on we don't know the context in which is running we don't know what other requests are showing up at what point they're showing up what the sort of relative priority is and so the best thing that we can do is just say that at some point we'd like this to happen and sort of because again they're of this ubiquity it behooves us to understand how queues work how they fail what happens when they get overloaded and luckily there is a thing called queueing theory which unfortunately is more a branch of statistics and it is a branch of anything related to software specifically and I stand in front of you not as an expert in queueing theory this is a slightly snarky slide which I can't quite back up but I also stand for you someone who has worked with systems a lot and so I've also made a couple of pretty honest efforts to wrap my head around queueing theory and what I'm presenting to you is the things that seem to intersect the things that made sense to me that sort of informed my intuition for building with systems and if any of you at the end of this talk are interested and want to learn more this is by far the best book that I found it's still about 90% pure statistics but it does mention computer systems occasionally and if you read this what sort of resonates for you what sort of fuels your intuition may very well be different and so I really do encourage you to not take my word as the final word on the subject but with that said one of the key distinctions in queueing theory is between closed and open systems the closed systems where as we produce something we must wait for the consumer to complete it before producing something else and we're all very familiar with an example of this which is a ripple right if it reads in our input it must evaluate and print the result before looping back another example of a closed system is a single person browsing a website they receive a page they click on the link and they must wait for the next page to load before they can click again and so naturally they're sort of ceding control to each other and closed systems are certainly a thing that we deal with and they are as it turns out fairly easy to reason about but many of the things that give us trouble and many of the systems that we build are open systems where there is no coordination where when a request comes in we're processing it it's almost a given that a request is going to come in before ready and I'm going to talk about some sort of more concrete examples of open systems but it's a little bit hard for me to show you AWS graphs or something else and really demonstrate the properties of the system there are too many sort of dimensions to capture there and so I'm going to first show you a simulation of a an open system to sort of demonstrate the more abstract properties and then move on to something a little bit more concrete so in the simulation the producers are exponentially distributed which means that the time between each arrival follows an exponential distribution and this empirically has been shown to be a pretty good model it models how often people show up to banks or to food trucks or to any sort of other place where there are people competing for resources without any sort of coordination with respect to each other but the complexity of the tasks is not very well modeled by an exponential distribution and I'm showing you here a graph that looks quite a lot like the previous one but the Pareto distribution follows this sort of configurability of how fact the tale is right if you've heard of the 80/20 rule that is a particular sort of ratio between the sort of top end of the distribution and the bottom end and we are able to sort of control that - to determine how often these outliers come in how often we get unusually complex tasks and so I'm going to have a simulation where tasks are sort of arriving according to this exponential distribution which means that even if we're getting for instance a thousand queries per second they're not showing in up at regular one millisecond intervals right they'll sort of pile up on top of each other and there'll be these long periods where nothing happens the tasks however going to be Pareto distributed which means that we have this sort of very thin tail and over time it's going to grow a little bit fatter and if he wants to have sort of a reason why this might happen it might be that your disk is a little bit more under load or the VM that is sharing the machine with is taking up a little bit more and so your machine is just a little bit slower so there's a subtle change but in this simulation the best way that I have to sort of demonstrate what happens is using what's called a spectrogram and I don't know how many of you have ever seen something like this so I'll take some time to sort of describe what's going on here the vertical axis shows the latency which is the time that elapses between a task coming in it's reversing the queue it being consumed and completed and we're showing this on a logarithmic scale without any sort of numbers by the way because everything here is made up but it's really just to show sort of the relative behavior but on a logarithmic scale because towards the end as a system sort of becomes unstable it's the only way to make everything visible next to each other on the horizontal axis we have the movement of sort of the distribution for a thin tail right where most of our tasks are very easy and only a few are more complicated to a fat tail where proportionately more of them are complicated but again this is a very subtle change but we see that at the left side right there is this very tight distribution because the heatmap here represents the probability that any one of our tasks took the most amount of time there's this very thin red strip that represents where most of our tasks are that most of our tasks are completing in a reasonable amount of time but as it begins to kind of get more and more complex we see that it starts to creep up and about two-thirds of the way through we have this sort of jump to a new equilibrium all the red dots are sort of scattered several orders of magnitude above where it was before this is an equilibrium we're not out of control yet but everything is significant significantly slower than it was before and then as it continues we see that it gets more and more out of control until it just sort of takes off like a rocket and this is what happens when a queue grows out of control right this is a system in crisis this is what will cause our system to run out of memory and crash and you'll also notice that whereas before the distribution was very sort of tightly bound now it's just this very sort of thin strip and this is because the time being spent on the queue the time that you're waiting behind all the other tasks that are queued up dominates your time and all these are sort of averaging out right we now have actually a very consistent but slow time is being taken for each of these tasks and so we know when a system is out of capacity when it's things are running too slowly we just add more machines and this does exactly what you would expect here we add an additional consumer and we basically keep up we can see towards the end that it begins to flare up and maybe if this sort of trend continued it would also fall over but not yet and again you know if we add even two more machines it's barely phase right it's basically it's well within sort of its operating abilities a less obvious thing though is what happens when we have additional consumers but increase the incoming rate by a commensurate amount so here we have sort of a 1x rate and one consumer then we have a 4x rate and for consumers and 16 and 16 and the key thing here is that all of these have failed at exactly the same time right this is in no way increasing sort of our ability to handle volume but what they look like as they lead up to failure is very different and sort of conceptually why this is is that every time that an expensive task comes in in the case we only have one consumer it's focusing highly on that right and things are starting to kind of pile up behind it right these long tail tasks these complex tasks completely cripple our ability to keep up but when you have four consumers only three out of the four consumers can sort of continue but we still are able to make some forward progress and like the sixteen it we have to have very unusual confluence of events come in for us to be completely sort of taken up with these long tail tasks but you'll notice that in the case of the sixteen it looks very very good almost up until the point where it falls over and I will point out that this is the best sort of analog for the systems that we build we have multiple machines we have multiple threads and so we can absolutely have a system which is responding in sub-millisecond time right for the point where it becomes completely unstable right it happens pretty much all at once and so what we've learned here is that your systems are typically open ones or at least the ones that you will struggle to sort of understand as they run in production and it's the long tail of the of your tasks that cause your system to kind of fill up right and so we can't just look at what the mean or median complexity we have to be very aware of what the unusually complex tasks look like and drag that very very closely and this is important not just because of we had us looking at the complexity of things that are downstream of us but also because the outliers in our own response time are what will make the systems that are relying upon a stable or unstable and so using more consumers will reduce our variance right up until the point where the system actually falls over which is inevitable right there is no way that we can protect against this data at some point will outstrip our ability if only for a short period of time and so the real lesson here and the one thing that I hope that you take away from this is that unbounded queues are fundamentally broken because it puts the correctness of your system somebody else's hands right you have no control over how fast the data is showing up and so that means that someone can intentionally unintentionally cause your system to break and so we cannot have our strategy for dealing with too much data to hold on to it and hope for the best we have to you know do something a little bit more proactive and so when we look at this problem what happens when too much data shows up we really have only three strategies we can drop the data on the floor we can reject the data which is they drop the day on the floor and let them know that we dropped it or we can posit ate it which is a say back to whoever's getting us a data I can't this right now and then at some point in the future say now I can and so dropping the data is actually a valid strategy for a certain subset of problems where the newer data makes the older data obsolete once we get a new piece of data we can drop anything else we had before it doesn't matter anymore and this would be true for some cases of telemetry some sensor systems it's are generally a pretty narrow set of problems the other reason we can drop the data is if we just don't care if it's sort of best-effort right we'd like to have the data if we possibly could but if we don't then oh well I think that we do this a little bit too readily honestly for instance you can see that monitoring systems like stats D use UDP which is again a sort of best effort delivery you fire it and sort of just pray that it gets there and you'll see this sort of correlation between situations where the network is in crisis and the sort of likeliness of this data being dropped which means that when you most need accurate data were least likely to get it and so this certainly is a thing that we can do but we should often if we can sort of search for a better approach rejecting the data is pretty much what you see when you go to an overloaded website and returns a 503 or when Twitter back in the day is return to fail well or any of these other sorts of things this is the most common thing to do at the application level because when we have too many requests we don't have enough information to do something more graceful than that right we can't email the customer saying oh well you can't check out your cart now but we'll let you know once you can right we don't even know who they are we have no means to figure out who they are and so this is far from ideal ideally we would never reject anything especially if you know someone's going to pay us money for it but we need to have a understand this is inevitable we need to understand like what we're going to do in this situation and make that as palatable as we can but elsewhere right except that when we're not sort of writing code of the application of a1 or trying to write a more general sort of thing what we want to do is you want to pause the data or exert back pressure and a closed systems in this is sort of implicit right we have this sort of seeding of control between the producer and the consumer and that's sort of built into it but we're writing a library or a framework or something like that we don't know what is appropriate we don't know if dropping data is appropriate we don't know how to reject the data in a meaningful way and so the best thing that we can do is just sort of defer the choice to someone who's upstream and understands the application better than we do and so this is why core async has three cues instead of just one it has the puts buffer for things which have had back pressure exerted on them where we've told them we're working on it but we can't quite get to you and then we'll eventually be able to notify them it's okay we took it you can continue and so when we look at queues as I said you know to say sort of separate the what from the one they also sort of allow us to deal with potentially unbounded demand using very bounded resources and they serve as sort of interface between systems where the demand and the supply may be sort of mismatched or we can't really predict what the relative proportion each will be and these are all sort of saying the same thing but the crucial point I want to make is that the buffer right the thing that holds onto the messages is unnecessary for any of these properties right in fact we could just as easily have a channel which has just the puts in the takes just waits to sort of pass the message off once it's available and in fact in core you think this is the default channel configuration it has no buffer and the reason that this is good is because an unbuffered queue is a closed system right there is that sort of implicit seeding of control we only proceed when someone else is actually taking it off our hands and so you may wonder you know why on earth would we even have buffers right buffers seem only to be a thing that introduced latency and the answer to this is that back pressure right and acting back back pressure and retracting back pressure is not free if we want to restart a thread that we stopped you have to copy the stack back into memory if we want to restart TCP traffic we stop we have to send a message over the network and wait for the rate of traffic to ramp back up if we're reading off the disk or database we have to go and actually do that and the time that elapses between us saying we're ready and so I'm actually giving us something to work on is hurting our overall throughput our actual rate of handling the data and so that's what buffers are good for right buffers allow us to make sure that our throughput is more sort of stable and is going at just a higher rate and that's of course at the expense of latency in our systems and so we should always plan for too much data right this is something that is inevitable and we need to know what our sort of response this will be and wherever possible we should use back pressure because that's the most neutral choice that we can make right that just kind of tells someone else I can't handle this data you figure out what that means and that might sort of you know flow up on a chain all the way to the periphery of our application before something says oh I guess I have to reject or I have to drop or I have to do whatever is appropriate in this situation but that logic should be centralized it should not be sort of sprinkled throughout and buffers are useful in that they give us this sort of constant throughput but it's at the expense of latency and so we should only use them where absolutely necessary and we should in practice avoid composing buffered queues where ever possible because each additional buffer tends to sort of magnify the results of the next and so you know as been going through this we've had some things that we've decided or I've told you and it's worth taking a moment to sort of compare these guidelines to the current state of the closure ecosystem a lot of the attempts to kind of take something like quarry sink which absolutely sort of in it implements back pressure and everything flawlessly but then it is hooked up to some sort of networking mechanism right something where again we have no control over how much data is coming in and it doesn't hook it up properly and in some cases this is unavoidable for instance the Java EE spec for WebSockets has no mechanism for back pressure I don't know why but it doesn't and so when you are trying to use this in a servlet container or something like that there's no real way to solve this problem but in other cases it's just sort of an oversight right the this particular author has never been bit by this problem doesn't see sort of how important it is to make these sorts of connections to not break the golden chain in the back pressure and so we as consumers the libraries I think need to be more discerning we need to ask the tough questions when we look at this sort of stuff and make sure that we're not building on top of shaky foundations and this is not to call out any of the authors who have made a library which many of which are you know used to good effect within our community but I think that we want to make sure that our ecosystem can stand up to any scale of system right that we are not forced to rebuild from scratch once we hit the point of failure which can happen just all of a sudden right and so a concrete example of this is an open system that I work on which is called ma and ma is sort of this omnivorous endpoint that just absorbs any data that our partners or customers send to us and so the general idea looks something like this we have a large amount of data coming in we need to do a little bit of processing to it and then we need to persist it somewhere for batch processing in the future and because I didn't want to have to work very hard on the hard parts of this I decided to do this on AWS which gives us the elastic load balancer which is a place you can sort of point arbitrary amounts of traffic and have it spread it across a bunch of machines and I wanted to persist at s3 which is a replicated data store which is highly available and is able to kind of you know deal with whatever volumes of data that we want to write to it and the system has some pretty particular properties right it has to be horizontally scalable because we don't know how many how much they were going to need right when it first was started up we were getting about 40,000 queries per second we're up to about 300,000 at peak about a year and a half later and so I wanted to be able to have that just kind of work without me having to go back and sort of rethink the entire thing and it must be low maintenance this is far from the only thing that I'm responsible for if it takes it more than a day out of a month of my time then something's gone horribly wrong it's not real time we don't really care about the data getting in to s3 in any particular rapid pace as long as sort of there by the end of the day and the most important property is that the loss of the data is more or less directly proportional to lost utility if we lose 1% of the data we lose about 1% of the utility and this is very different than if we were doing say payment processing we're losing 1% of the data would make us useless right and so this is a very particular property which makes my job a lot easier and so in the previously we had set up a system that absorbed the twittered garden hose which is 10% of the current fire hose and this is the official term and that ended up being about 600 tweets per second and so I had you know kind of jammed something together which took an HTTP client stream and then put it through the Hadoop s3 client which was something that we had on hand and wrote it to s3 and this worked very well and so in my sort of initial let's just get something up and running I adapted that and put a web server in front of it and then wrote the entries which I assumed to be about 20,000 entries per second through the Hadoop client but what I was unaware of what I had not really been tracking with the the twitter client was that occasionally s3 becomes unavailable right for whatever reason I still don't really know why but it just stops accepting rights for a bit and what happens in this case is that the Hadoop s3 client sort of just held on to everything hoping that you know eventually everything would sort of work itself out and be able to flush all the accumulated data but in fact it was getting stuff a to wrap at a pace and so we ran out of memory and the system fell over and then all of the traffic that that machine was handling fell over to the other machines which themselves fell over and then the entire system was dead and you know to be clear this was almost certainly happening with the Twitter client before because I don't see why they would behave any differently and in fact I'm sure that many times it held on the data and was able to flush it out right but this is sort of invisible to me this was not a failure mode I was thinking very carefully about until I saw it and had to deal with it and so when I sort of thought about this a little bit more clearly though I realized that I was making my job a lot harder than it had to be because I was treating the handling of this data as getting it all the way into s3 right which again is sort of over a network hop requires it to be replicated and everything else and really all I wanted was for it to be somewhere that was not in memory I could ride it to the local disk for instance which is much more highly available as it turns out than s 3 and obviously this has some shortcomings right we in this case if we lose the machine we lose the data but again like this is something that for this particular use case I was ok with and so I end up writing two libraries both of which are open source one of which is called durable queue which is exactly what it sounds like it's a disk back to queue very simple very sort of well tested you know it's had about I think at this point three to four trillion events flowing through it and then likewise s3 journal which is built on top of the durable queue right it will write each of the things that intends to upload s3 to disk and then on the other side have a consumer which is uploading it as a table and so the system that I end up building looks like this right and so now there are actually two stages the HT server is only concerned with the act of getting it on to some persisted store and then there is a different sort of producer and consumer the different loop which is uploading that to s3 and of course the fact that I've simply made my job easier doesn't mean that it's always going to work and so I need to make sure that everything is working and so one of the flaws arguably of the Dubarry client wasn't that it just had an unbounded queue it's that it was invisible right we don't know what's going on inside of it we only know by inferring that once our system fails it was because our memory has sort of ballooned out of control and so s3 journal and the underlying queue both have metrics right you just you can go and pull it and say what is your status in this case how many things have i in queued to you and how much have you successfully uploaded and the queue underneath has even more stuff which says you know how many slabs which are the sort of files that it's writing to do you have currently resident memory how many things are pending how many things have you had to retry and so this is crucial right this allows us to say what is the health of our system it allows us to quantify how much we might have lost when a machine goes down these are all things that are absolutely critical because we cannot prove properties about our system right the system is too complicated we can only look and see what happens in as much as the metrics that we have created for ourselves allow us to do that but on the far side of this on the HTTP intake side there is the additional problem that we need to respond to everyone who's sending us data quickly tell them yes we've accepted your data because otherwise they will scale back the volume of data they were sending us we're trying to get everything that we possibly can and so underneath the Aleph HP server uses neti and that he has a very particular threading model where every time a connection is opened it's assigned to a thread and many connections are assigned to a given thread which means that if any connection takes up that thread right blocks on it or something that in implicitly blocks a bunch of other connections that didn't really want to be included in that decision and so the way that you deal with this is that you put another thread pool behind it right that sort of goes and spreads this out and make sure that no one connection acting badly can cause other things to back up and when we look at the metrics here if we put something in our ring handler that's looking at our response time it actually is a fairly far off from the actual response time right the thing that is actually the packet coming into the packet going out and this is because thread pools are themselves not very well instrumented right they're they're kind of this invisible thing there's a queue we spend some time on it and then it happens eventually who knows one and so there's a library that I wrote which has a bunch of stuff in it but one of them is an instrumented thread pool right something that actually tracks all the relative statistics and so when I go to any of the MA servers that are running there's a JSON endpoint that looks like this and I'm going to focus in on two aspects of this one of which is the sort of measuring of the task rejection completion in arrival rate and notice that none of these are just simple scalar values they're all quantiles because again the rate of arrival is going to be sort of spiking and going down as as sort of different requests which are not coordinated to show up at the same time or not at all and so you'll notice that the arrival in completion rate the difference between the median and the 99.9% tile is about a 3x difference and so when we talk about capacity planning right we have to you know think about that likewise we are in fact sometimes rejecting data because we don't have the capacity to handle it it seems to only happen very infrequently but this is something that we should absolutely be tracking there's also the task latency and the queue latency where the task latency is a super set it represents the entire time spent on the queue and working on the task and the queue latency represents just a time spent on the queue you'll notice that the 99.9 percent aisle for the queue is actually higher that for the task that's because these are sampled statistics right sometimes the what the queue is looking at and what the task is looking at are going to be slightly different and so this is imperfect right it's not a perfect high fidelity view of our system but it is good enough and you'll also notice that the queue is zero in length all the time because it is actually an unbuffered queue and yet somehow we're spending two milliseconds sometimes trying to traverse it and this is because of course underneath our Q which is unbuffered there are many queues which are buffered right all the things that sit underneath the execution model of Java and so again you need to measure that we need to be aware of these things and not just treat this as some sort of instantaneous process of handing off between producer and consumer and so we have metrics now right we have something that gives us a little bit more visibility in the system it obviously doesn't actually capture everything it doesn't look very deep into Metis and if we ever saw a huge discrepancy between the response times that we were seeing in the response times being reported outside of our system we would have to dig into that right there is always something more to measure but the idea is that we just don't want to be sort of willfully blind and close our eyes and pretend that everything's okay and so it's really crucial when we're building these systems that we understand what it means to complete something right because too many things that are incomplete means that we need to apply back pressure or drop data or reject it and so if we can sort of move the goalposts and make that easier that is a huge win and you should be picking about your tools you should make sure that they don't sort of subvert your assumptions of what should be happening or what you know needs to happen in these sorts of systems and you know when choosing a tool right raw performance is good these benchmarks they tell us something which is probably if I use this right it will be fast but you should always prefer something which actually tells you how fast it is because the assumptions that are from these sorts of very contrived benchmarks will not necessarily make their way into reality and you're never going to be able to measure all of your system they're always going to be things that are outside of your control someone who's accessing your website from a mobile connection right might go into a tunnel and lose their connection for X number of seconds and that's clearly not something that you can account for that you can sort of prevent but that is what they're perceiving your response time to be and whether or not that matters whether or not that's something that you're trying to kind of deal with is very dependent on your application but there are always going to be things that you were unaware of and so I'm going to talk about another example here which is sort of like a common very toy use case that you see with these sorts of things using nodejs or something like that which is a chat server right you join a room there are a bunch of people there they're talking to each other every message is broadcast out and I'm going to use a library called manifold not because it's a sufficient really better at this than anything else but it's because something that I know and it has some nice operators that are sort of useful in the situation but you could do something very similar with Cori sink or any other number of stream abstraction libraries and in manifold there's something called an event bus which is something that just kind of is for the publisher subscriber model right we can publish out to a topic any subscribers on that topic will receive the message if there are no subscribers then it's a no op it just sort of vanishes into the ether and when we have a chat handler when we actually get some connecting we find out what room they want to join and then we take every message from that connection and we publish it out to the room that they've selected right very straightforward we're just taking everything and throwing it through another stream and likewise we connect to the room and take everything that's been said in that room and pipe it back into the connection and so this is a chat handler right this this is IRC in its essence right but there are some problems here one of which is that if any of our consumers anybody on the chat room is slow to receive a message that slows down everybody because the act of publishing means it has to go to everyone and so we want to sort of control this you want to make sure that one bad actor doesn't make the entire system sort of destabilize and so when we're looking at this connection right the taking all the things from the room and just kind of piping them over to this client whoever they are maybe we want to add a buffer right be able to write out a few of these sorts of you know instabilities here maybe if they're just a little bit slow we can make it so that that's not too visible to everybody else and of course measuring this in messages may not be the right thing because who knows how big a message is it might be one word it might be a novel and so a better thing would be to have some sort of metric write something that tells us how big the message is and then put an upper bound on the aggregate size of all the messages in this case just a thousand but even that doesn't necessarily prevent us from arriving at the same port of same failure mill right it just defers it may put sit off a little bit further into the future and so the only thing that we really can do is once someone is slow to receive our messages is unable to receive them we kick them out and so the connect method here takes a message source and pipes all the messages from it into a message sink and there's an optional thing here to define a timeout which says if I've been trying to give you a message for X number of milliseconds and I can't do that I'm just going to close you and light my hands of the whole thing and so in this case we say if it we've been trying for 10 seconds to send you a message and you haven't received it yet your buffers been full for that entire time we're done you're gone you can just reconnect and this is really again the only thing we can do this is there is no other thing that we can do to save this person from themselves but this actually creates another problem which is if someone is publishing messages too fast they can overwhelm everyone and get everybody kicked right and you know again this may be because someone's a jerk it may be because someone's sort of naive you can't really differentiate between the two but but this is I mean you know it's it's it's funny but it's also like it's your job to make sure that your system is robust to the rest of the world whatever that means whatever sort of motivations you ascribe to that right it's your job to make sure that you don't get knocked over that you know the many don't suffer because the actions of the few and so you really need to think about this carefully and so in this particular case we can throttle the connection right we can say you're only allowed to send us one message a second and note here that we don't actually need to have the same sort of disconnect logic because if someone's sending us messages to rapidly we'll just keep on using TCP back pressure and saying nope we're not taking it and if that causes them to fail that's their problem but in this case throttling at one message a second may seem a little bit irritating because maybe you know what happens if someone's been silent for ten minutes right are they still only able to send one message a second can they send ten minutes worth of messages all at once I mean these are sort of both possibilities when we talk about throttling and so we are able here to sort of specify the maximum amount of credits that they're able to accrue in this case ten seconds worth right they're able to if they're signing for 10 seconds they can send ten messages all at once but they can't send more and so when we look at this right this sort of example it's it's a little bit less platonic than it was before right it's a little bit less general because we've had to make all these decisions they've had to decide what is the maximum amount of memory that we're willing to give to each of these people how long we were willing to wait on them how much how many messages can people send us right and so all of a sudden we've gone from this thing which was a little bit hand wavy a little bit vague very general everybody could sort of project there a particular chatroom onto it and now this is a very precise chatroom right maybe not a very good one but it's certainly precise in terms of what it does and this is what you need to do right these are the steps that you need to take and I want to emphasize also that it's not hard to do this right you have to think a little bit about what these boundary conditions should be but this is what's required for us to be able to make these robust systems for us to be able to ride out the the outside world and I want to make one additional point about the sort of chatroom example because all the other examples I've talked about has had a queue or like two queues that are there and we we have a fixed cardinality which means that we can monitor them have graphs associated with them basically look at them and understand their behavior very particularly but with a chatroom example we have sort of an untold number of queues the queues are constantly disappearing and reappearing in new configurations and so the sort of typical monitoring thing which is to look at all the individual queues is not sufficient and it becomes very hard for us to understand the system as a whole because the system is constantly changing out from underneath us and so I think that this is a real problem if you think about what this would look like for core async you know you have all these go routines that are sort of like shoving messages in every direction but you can't look at it and sort of apprehend it as a whole and I think that this is a weakness at least for certain classes of applications and so in manifold I've played around with this a little bit and one of the things you can do is you can look at any stream and ask it what streams are downstream of you right what are you feeding into and this is because all these methods the connect method or the consume method actually make a marker update this sort of internal topology and this means that we can to some degree right trace the system look at each of the streams understand how they're feeding into each other it's not clear to me exactly how this should be represented and some of my earlier attempts of this I had graphviz visualizations but I've come to the conclusion that that doesn't scale very well right it doesn't take a very complex topology to make that sort of not well understood and so maybe the right thing is to come up with a better visualization tool maybe it's to dump it to disk for you know examination later I think that this is something which is a sort of rich topic and right for sort of exploration but I think that and again for in order for closure to become an ecosystem we can build these sorts of systems that we don't just sort of you know cross our fingers and hope they don't fail like this is necessary we need to answer these questions we need to have good answers for these questions and so we talked about queues and the things we learned are that unbounded queues aren't unbounded right they're bounded by memory or some other sort of fixed shared resource that will cause our system to fall over once that's exceeded and when we build an application it should always account for what happens when it receives too much data but where we're not at the sort of boundaries of application we're trying to write something which is more general or less opinionated or can be reused elsewhere we should always use back pressure right we should always defer to someone else who knows better than us what to do and we should demand metrics everywhere right even if we don't actually get them even if there are always things that are a little bit out of sight we should always be asking for more and we should always know how we could go one step deeper and with that I think we're pretty much exactly out of time but I can take questions in the hallway