To Clojure and back: writing and rewriting in Ruby

0 0

ah hi everyone thank you so much for coming teittleman talk very very Lussier my name is philipp mendoza vida you can find me in various social media at fill MV with two L's many other places online as well as twitter i am one of the cofounders of ruby sec calm which is the we host the Ruby security advisory database and so if there's anyone in the audience who maintains gems I would love to talk to you afterwards about how you handle security disclosures but for my day job I run a start-up called app canary and app canary notifies you whenever you are running a vulnerable package in your servers or your applications and basically we spend a lot of time thinking about are you running vulnerable software so this talk is ultimately about a series of technical decisions we made and implemented while building the service that powers up canary and as a result I feel that it's best in order to best understand our thinking process we should start a story today at the beginning of our incredible journey as a we're so back in 2012 I began a consultancy with a friend of mine Max and we specialized in doing penetration tests and security code audits and doing mpps and business automation and advising teams on how to improve their software development process and one benefit of being a consultant is that you have a lot of control of your schedule so by the time November 2014 rolled around I had just spent two months working 12-hour days on the Toronto mayoral elections and my co-founder max had just had some magical summer at the recurse Center and as a consequence I was really burned out and he was really bored and we were really looking to do something more challenging and so we'd happened upon this market opportunity that could really use our skills and we said okay let's build a product we've had some experience with this problem domain because we'd built this free service called gem canary a few years earlier but in a nutshell an advisory has many vulnerabilities of owner ability you has many packages a package has many versions and we need to keep track of all of this in order to tell you if you are running a version that you shouldn't rate and so we looked around and max turns to me says hey let's use this thing called a tonic so if you never heard about de Tomic the atomic is really cool the atomic is this key value store graph database where instead of sequel you write data log which is kind of prologue kind of language and on top of all this you get a free point-in-time database which means that you can roll back to any previous state of your data is basically nothing ever gets deleted it's kind of really cool and so the way we're kind of thinking about with our struggles from the kind of free service that we built were like this would be really handy let's try using this but unfortunately all the client libraries for it kind of sucked but it works a really great of closures so let's use closure closures really cool right for those of you who don't know closures is functional programming language that features immutable data structures that give you all sorts of like eight synchronous primitives for free and it runs on the JVM which is supposed to be web scale I don't know and I mean shouldn't we all learn Lisp really is not what you're supposed to do isn't Lisp supposed to be this profound enlightening experience I'll forever change how you program for the rest of your life right I mean for me personally at this stage I've been writing Ruby for five years most of my career you know it doesn't have a beard but one thing that kind of dominated my thinking was this fear right ah am i stagnating right because when I jumped into Ruby it was like synonymous bleeding edge like web tech and now not so much anymore am I still going to be able to get a job right what actually happens to programmers after the age of 35 right to just vanish or something and I really didn't want to spend the rest of my life writing JavaScript right so the idea that there was this other technology set that I could in fat stand and kind of broaden my skills is really appealing so he said it we're gonna run out of money in six months anyways let's do it so we went out one built it enclosure it was really great so for people in the audience who might know closure already I know using the wrong words because people in the audience I'm assuming more familiar foobie so just please bear with me so closure is a bit familiar right um both closure and Ruby are truthy where things are nil or false or good enough they both really spend all their time dealing with hashes of symbols and they both have all these fun meta programming constructs but so wise clothes are cool right let's let's start with the big one functional programming in closure functions are the basic semantic building blocks right you're encouraged from captured behavior those smallest units that are practical so in this little function here we're filtering out all the aw even integers by applying the even function to this array and if you squint you've seen this before right this is not like unreasonable to you and there's this old joke that ruby is an acceptable list right there's this blog post in like 2005 and makes this argument but if you kind of think about it even is an artifact fixnum select is defined and enumerable but if we can contrast this with the closure code a filter is the one that worries about what kind of data structures it takes right it's not it's the data structure itself does not know the hash of the map doesn't really concern themselves and even is just any function right this is really cool right it's not a property of the numbers or things inside of the array like I can I can do whatever I want in there right and so being able to combine bits of behavior like this can be really powerful another thing that's really cool about this is when everything is a function nine times out of ten you can just select the whole thing cut it out create a new function put it in and we have call what the thing was originally and then you're free to just change this new structure you don't have to really worry about it right everything gets passed in you don't think about it and finally immutability is really really interesting historically it was an expensive feature to have until in 2002 there's this breakthrough and under cool papers got written and so it became like an easy a reasonable thing to put in your computer and our most trivial sense immutable means something they can't change right and in practice thing is that for every insert update or deletion you end up with a dis brand new structure and this is important because if you think about it when you're mutating state that gives you a lot of complex a lot of complexity that results from it and if you can just remove having to worry about things changing when you're not looking huge amounts of bugs are just gone and the best way to illustrate this is with a little bit of Ruby code so suppose that we have this list right just a bunch of strings up in there and we assign this list to another variable and then we have we apply this append method to it right right appending whatever that means D to the first list what happens is list two in Ruby this question requires a lot of thinking right our strings mutable right or arrays mutable this Ruby passed by reference or by value does append and most importantly is that a pen method mutate or copy because I mean you could have something with the same semantics different you know the same semantics purpose does the same thing but in practice very different outcomes so this func this method will insert something at the very end of the array right and this one will create a new array all right you don't know that from the tall and as use them and practice to have very different implications and the thing about Ruby is that you can't get away from this right freezing won't save you I can freeze this array and I can still modify the contents of it right so I can't add things to the array so pedantically that's true but for my purposes like some other thing could mutate my state I can't do because I can dupe something and I can still modify the content so the thing and still points back because in Ruby I've cloned the array but it's still pointing back to the original objects which in this case are mutable strings right literally the only way to guarantee that your objects aren't being messed with is to do some atrocity like this all right now I have code that does this because I had to because I have a thing deep in the bowels of active record that had to do something I was like why is my thing disappearing and turns out it consumes the objects literally so this code I use right and I think we can all agree this is kind of ridiculous right but it's Ruby so there's nothing you can do about it you have no control over whether something can be mutated not which is really unfortunate because once you have immutability I found right and my experience going through this the flow of State through your app becomes obvious and predictable and there's a kind of a way to illustrate this so this next page is a function from our old app and it's just a route that takes in the user and a data base and or some parameters that we've passed in and it processes the stuff and tries to parse it out right because we do a lot of text you send this text we figure out what packages you have stuff like that so the specifics don't really matter and that what's cool about this is that I know from the level of indentation that this stuff at the furthest level of indentation literally can't modify things outs outside the lexical scope right because they're just immutable I can maybe redefine the meaning of a variable I mean I can't unless it's if it's not the same scope right like and if it's in the same scope I can change the meaning of that variable but stuff that's outside of it literally I can't touch it because it's all immutable and that's really cool all right there's this kind of confidence you can get out of it and so this is I don't have all the time of the world but I'm going to give you a quick illustration of why they Tomic sign it cool so your boss comes up to you one day and she says hey so this report it's really cool but can you just do this flashes data and in traditional databases depending on how you spend entirely in how you design your system right you either like specifically designed it so you could do this right you took snapshots of the data or your report collects the right things or whatever this go request can be the simple this rather simple request can actually be a nightmare to execute on and what's really cool about the atomic is that I can have a function that given the database will give you whatever value you're interested in and then I can grab that database and I can say well this is cool but I want exactly what I had a year ago and then I just rerun the same report it just works right that's that's cool right and I'm not saying that day Tomic as a key value graph database is something you want to do reporting on them but in principle it is the kind of stuff you can get away with all right cool we will really want to play this new technology it's cool it's for in the right place who gots got a use for it but I've been doing all this time all I'd spent years building websites I have zero interest in like figuring out a team user authentication and cookies management and whatever in any language I already know how to do that right let's just do the hard part's enclosure and so we ended up with something like this we have this API that speaks is a Tomic and we have this rails quote front-end that does boring things like talk to stripe and deal with users and stuff cool great um in the meantime we start working on this full-time in February 2015 in May much to our shock we gone to Y Combinator which meant that you know money was slightly less of a problem in July we released the production and then we moved back to Toronto in October and we just slave away at this adding more features and so we ended up with paying customers real demands and real problems because it's taking weeks to ship and debug simple features even though we're holed up in our office um working our butts off and a retrospect we had like three large broad issues right number one is that we just made a bad architectural assumption there's we underestimated the domain complexity because it turns out that what is a package really is a deeper question than would we consider it first because everyone has a slightly different definition but it just meant that we had this message that model that was hard to modify number two we had these separate deployments with added all this overhead and it turns out there is a high fixed cost to orchestrate all these different services at once and I have more to say about this will kind of come back this a bit later but on top of all this I found myself like really struggling with the environment that were in right it kind of felt that we had this death by a thousand paper cuts all these different like small issues that in of themselves are not deal-breaker but they added up right so first and for all it's not clear how to structure large apps it's like if you if I'm trying to like model a chess game that works pretty well but then if I have like users that come in and they have these preference that have to be stored and how do I put this and this there's a lot of literature on how to structure these things right rails kind of hold you by the hand and you kind of get used use of that um closure can be really fun too right a lot of times you should really clever writing it you're like yeah yeah this is computer science right here but as a consequence it can be really hard to read there's this like in-between state where like I understand how the syntax works I know what most of these functions mean I have to really think hard what all these things are doing unpacking themselves it can almost be too expressive right you can pack in too much meaning into too little bit and like closure has all these deep subtleties I when I was writing this that came conclusion that like a good met a good analogy would be like C++ right like you can pick up C the subset of C++ really quickly that like you can be productive in it but in order to read anyone's code you have this infinity of other stuff that you need to understand in order to apply to it right if you have templates you have boosts to what on this analogy and enclosure land you have reducers transducers atoms agents protocols reader macros it just goes on and on Iran because they have all this cool that's been bolted on but it makes it really hard to kind of like like feel confident with hell you're doing another problem is that when everything is an anonymous function your stack traces are useless and some of you might have direct experience with this dealing with JavaScript right or you have some complicated nested JavaScript and it goes it barfs on your lap you're like what happened and are like well somewhere in here there was a problem and when you're debugging something on production that's like not a good feeling right I'm just like I hate you I need to figure this out so compounding this there's like comically terse documentation it's like in order to a pen you take a list of F's that have X like I didn't can't bring up a good example but it's it's tough to parse out sometimes there's the koch's that are just hidden etc there isn't really a debugger in the system there's the repple which Lisp is famous for but I can't really attach something and say AHA here's data show me what's happening so I know and finally the Java Virtual Machine is just I mean our app took 30 seconds to boot which means that deep deep error integration is required in order for you to like not tear your hair out which enclosure land means that like I sure hope you love you necks because if you don't I mean you can make it work with like Java and stuff but then you're dealing with Java line tools which is maybe it's just not not me our deployments the deployment story and javelin is kind of complicated so you have to grab all your dependencies and shove them into one object and then push them onto your server which meant like hundreds of Meg's somehow which meant that I can't fix something from a coffee shop which is like curvy lines like oh no and our app took like gigabytes of RAM it literally needed like a gigabyte snort the boot and I still don't comprehend why and finally just like miscellaneous chattel and stuff fine this is all just another way of saying that I'm really fluent in Ruby in a way that I'm not and all this other stuff right it's not like necessarily their fault for any of this like six months of closure or the 12 months of closure that's not compared to five or six years of Ruby like I'd gone to certain stage and Ruby were like while you see the difference between two point three point eleven and then 3.2 is that in rails lands you'd have these different fusion I'd like you you don't have that same depth of knowledge but regardless is I spend a lot of time feeling frustrated because I'm finding the toolset and I don't know for certain that I'm not stupid but like I'm pretty sure I'm not stupid um and so it wasn't until I found this comment on Hacker News that described the ecosystem is user hostile I was like oh it's not me other people feel this way and everything kind of clicked and kind of in a nutshell if I had to summarize it it seems like developer happiness is not a virtue in that community um and the best example off the top of my head is like I went to this user local user group I'm like hey guys I need a debugger III do debugger driven development that's how I live and they're like well you have a console I should be good enough what I need a bugger for you can just type in all of your state onto the console and deal with it like hmm so I had this like a strange feeling of culture shock so time passes it's now May of this year um and we knew we had to refactor application right at the very we need to clean some things out right because like if you're holed up in some ass bit of Suburbia desperately working trying to like push this birth this app out you're not going to make good long-term architectural decisions we need to do something cool some some spring cleaning and I happened to be visiting a friend of California and I'm like well you know if I just spend two weeks cleaning things up it'll be great you know good you know whatever and my friend turns to me dead pans and it goes just to rewrite it already and I'm like huh I mean it's not an original idea I've thought about it before but after that stage is kind of watershed moment after which I couldn't really ignore that auction anymore um and the problem of course is that rewrites are risky alright when you engage in a rewrite you have a hell of a lot of effort that if everything goes well no one will ever notice that is the success condition that is the best case scenario is that your users login and they go oh yeah this is exactly the thing I had yesterday but that said I mean we're spending all this time fighting the ecosystem we're going to be migrating this platform you know really really well frankly we were not using a tonic properly flat out just like we're we're not using it like you shouldn't just be shopping everything in there like you should have specific things you're trying to do and most importantly we'd be reducing number of moving parts that we had right because this is what shipping features look like we would add features and tests to the quote API we would submit your code review a que have the other person review it we deploy it's a staging we would build the UI in the web get it to talk to the API via JSON test that code review that and then manage deploy both to production both apps to production simultaneously right and so it turns out that if something like this with two people this is a bad idea right like don't do that because we actually unintentionally built a distributed system and microservices incur large fixed costs that are really difficult for small teams to pay and this is fine if you're a large company because like your service boundaries should roughly mimic your team boundaries like if you can't change the color of a button without having three meetings about it like Microsoft is defined right that is not the part of your overhead like you can you can orchestrate that but well for a two-person startup it was just like we're spending all this time like whether we're fighting closure or me personally find closure my business part of it or you're finding an architecture that we'd accidentally set ourselves up with the end result is that we're spending a lot of time working on things I don't really matter and so Dave McKinley whose is this early guide Etsy and does a bunch of other stuff has a somewhat famous talk we call called choose boring technology which I think the key takeaway from that the one that really sat with me was that you can work on about three hard things at once right and you should make sure that there are the things that matter to your business right like we built ourselves an object relational matter for day Tomic no one cares like no one knows gonna send me emails like aw man you know I pay you X dollars per month but that get of library releases dope like not not a thing right so in the end we way we thought about it like we're reducing our exposure to risk which is what you want right we're already doing something really risky with our time money and careers right Lee you don't want to do things that don't pan out as a small operation we had the luxury of kind of stalling a little bit right if you're if you're like Netscape in 1997 whenever they did their big rewrite they had like millions of customers and it's people breathing the other neck bit different and we set a deadline sort of eventually and if anyone tries to do something about a deadline they're not serious about it so that's just kinda quick prototype in there cool so we said about it we did it over the summer ah on October 11th of this year which is not that long ago uh we had eight thousand three hundred lines of closure and about three thousand lines of Ruby and on the following day we had eight thousand seven hundred lines of Ruby right we still have fifteen hundred lines of go that's another story so it took four months to rewrite the thing top to bottom which is good compared to like the year and a half ago cuz the first time when we didn't know what we're doing right but it's just not a good way to spend your summer working weekends because it's really deeply demoralizing right because you're like slaving away at something that you can't ship and meanwhile I have an email coming in and be like so did your products neat but it would be great if you had this you're like yes hmm soon I'll have everything and so this really immense relief to ship right like poof so result point in time databases are not that hard to design in Postgres I really highly recommend doing some reading on like a data warehousing or domain driven design I have like 80% of what I need just like in a couple triggers which is cool I really recommend that despite going back to loll Rubby and way from web-scale illness our app is actually faster it's somewhere between like 50 to 100 percent faster but I mean we wrote the data model so duh and if I can if you'll indulge me in a brief tangent this is whole sub-genre of technical blog posts that go like this so we rewrote our app from boring technology to amazing new shiny technology and things are 10 times faster and so if you took all the lessons you learn the first time you rewrite the thing and you didn't squeeze more performance out of it like what would you do it like what were you trying to accomplish like what's going on I know but my Juby point kind of the takeaway from the talkin to you is that when I went back into ruby land I found that my Ruby's different now all that's time I spent enclosure kind of kind of had really crystallized some thoughts I've been having right because I can do a real history of like Ruby / rails over the years because it first it was like skinning controllers hat models and people like oh no my user is real user dot RB is just huge I don't know about that and then there's that concerns that came out but like concerns aren't really good way so you just have the same problem they're just spread across 50 different files for some reason right so I still have this like 500 method user that RB but through all these different file that's not great and so I've been having this event I use it like that and then people have different ideas of how service objects but then they try to make it to functionals you have to call call and everything anyways so key takeaways from my experience in closure land that kind of helped existing thoughts that I had one is that I try to be as immutable as review let me Ruby won't help you but you can be disciplined in what you do one example of this is something people been calling a value object and so it's really handy to pepper your code of them and these are objects whose sole purpose is to represent a bit of state where you pass it in when you initialize it ah and then afterwards you're just not allowed to like modify it right you pass it in once and any time you try to update it it just says nope can't do it so I don't use this gym but it's a good example of what's going on so teak Rafer had a little like immutable value object generally you can just install and when you do it basically the way it works is it you think you initialize it and then if you try to change it later it's like nope not doing it another thing that really changed me personally post closure land is that I've become much more aware of how state flows through my application I'd say like more that I'm paranoid about it so this is kind of hard to articulate and like a good key takeaway but I give you a bit of an example of a pattern that I've been using lately something I've been calling for lack of a better word a quote manager object so managers for me are distinct from controllers which I see is handling input so kind of briefly like a controller sucks an input from the user the model strictly only deals or persistence either speak says database or queries it or saves it that's it and then the managers take input from this from the user they look inside the database via the models and they kind of mingle it together so I'm not saying this is one true way of doing things no we're necessarily I'm going to stick stick with this this is how a lot of my code lately has been looking like so here's an example take the package manager right so pretty straightforward very simple we have an adder reader at the top which is means that the one and only time that they ever get modified is through the constructor right and then you pass an all-state you need four things within this class right and then when you add instance methods you just have to be really careful to never modify your inputs and never set instance variables right so all my a lot of my code ends up looking like this where I have something that was responsible for dealing a package manager will be initialized for once per platform and then anytime I want to find packages or change them or anything that has that mixes a lot of different business logic together it'll be like this right I can pass in an input but I will never change it I never said any more estate if I step if I set state that I have to pass along I create a value object and things operate on that because again in Ruby anything can mutate but none of your code has to alright so um if you kind of walk away if here some key takeaways right um one is that despite my like moaning and complaining closure is were flirting seriously it's maybe not like the most transcendental thing you'll do with your time but I mean it's it was neat I learned a lot from it I would happily work on someone else's project for it and while we've personally moved away in our business from relying on it um I'm actually kind of really excited for closure script I mean once they get into a state where you can just drop it into some like preprocessor and dump it out I could have a whole talk on the way so we're avoiding JavaScript on application but like I'm not gonna share my prejudices with you um and like that's just mean pointless but this is all this a society that one today comes and people are like you know this things need to need to be nicer very well there's a lot of cool stuff that's coming out of this if you ever thought with like a favorite scoot of ramen for rehab for instance like closure scrip will just work beautifully with that another key takeaway avoid building distribute assistance where long as you can we now have this beautiful model of where if I update something in one end I don't have to worry about how it gets to the other eventually like there's only so long you can do this for um eventually you will have to have multiple things that talk to each other and then we talk we can talk about the cap theorem at that stage and whatever but it turns out you can get really far without having to worry about it because I don't know about you but you know I can configure really beefy machines these days so if you just have this ones you can get away for like a lot of traffic before you really have to think about it too much and this is a kind of a corollary of avoid working on problems that don't matter right like does it really matter that like you have these containers that you can shove in and out that goes through it if you don't really have like a thousand machines they have to put it on right exhaust the tools that you know before reaching for a new one is kind of like a more pleasant way of saying like I don't think things are boring it's just something you know really well right and so reached the end of the things that you know really well before you if you're pressed for time even if you're new if you're just doing this for hoggin Saturday morning knock yourself out but like you know times clicking down got options do um and finally I think most importantly and kind of something I had been thinking a lot about in this experience is that code should make you happy and it was kind of like a rude awakening after spending effectively my whole career in Ruby that P notion that programming should be fun in painless is not a universal value this is bewildering to me but I mean the less charitable interpretation some people are just happy to be clever to feel clever in their programming and not necessarily happy so I would say I mean take it for what you will I think programming should be fun and painless and this is all just to say the like cherish what we have here in Ruby land and make sure you bring it kicking screaming Srinu through new communities so that about wraps it up this is the story roughly of how we built this service that will happily notify you if you're ever running for Bowl software and your stuff my name is Phillip Mendoza vieta and that's about it thanks very much