Wouldn’t you like to forget about the database?

0 0

so the question I have for you today is went you like to forget about the database you probably already saw this slide we all know that ruby has an absolutely awesome community we have awesome tools to work with makes our lives easier we have a lot of fun doing in I think that's actually a really important part of working in ruby is that mots wants us to have fun I think there's a lot of stuff that we do we have to do all the time we're not having as much fun as we could I think that comes around databases so I have the word shoehorn up there I think what we do every day millions of times a second well maybe not millions of times unless you're running note or something is we have objects and we need to persist them so we tear them apart we shove them over the wire into some relational database document database something I was fundamentally not meant to store objects now we love objects we love the dsl's we write make it very comfortable to work with these things translate business language over to technical language we have a lot of fun doing that I'd say this some of these query languages like sequel you've been having to know JavaScript for mongodb weird pattern matching stuff for Redis it's not really what we want to be playing with so again these databases are just not really meant to store objects that leads me to talk about object-relational impedance mismatch that's kind of big term basically what it means is that we have one world over here and we have this disconnected world over here and we have to get them to talk so we do that by using tools like active record mongoid stuff like that and you find in the the slide later but I'd encourage you to go read these articles they're pretty fascinating so my brother is actually learning how to program and he doesn't have any experience I think he took one SAS class the statistical SAS in college and he's very excited to learn how to program Ruby but the one thing as a rubios that I kind of dread telling him about is how to save his objects if you think about a just a normal request you know that we have to go out to the database we have to query it to build it back up we have migrations we have to maintain these databases and it's kind of a pain in the ass really so some P tins mismatch makes it really hard for people to learn how to write systems in Ruby so impedance mismatch it's actually described by this term Perot plain old Ruby objects I despise this term because if you think about it we have this term which is really what we want to be dealing in just objects we have it because our normal day-to-day objects are active record objects or their objects I'm sorry models that you know we have to persist to mongoid and so we think about the objects themselves in those classes in terms of the database that we store them in and I think that's really unfortunate so if we think back decades I mean older than me older than a lot of us there's small talk Ruby's largely based on small talk so small talk has the notion of an image i kind of think of the image like this well as plasma balls so the image is basically a running living environment for your objects now when you start up an image you start running some small talk and you create an object it just lives in this image when you close your program down your objects actually serialized straight to disk the cool part is when you bring your program back up the objects just pop back to life there is no query language to some external service so I think that's a really interesting way to to think about persistence so back in two thousand seven obbi Bryant gave a keynote railsconf and what he said was the future of ruby is here it's just that we have it over in the small talk world and you guys should come and get it I was like wow it's I don't even know what you mean by that it's back in 2007 I'd may be written like one blog app or something in Ruby what he suggested was that we petitioned gemstone a Portland company builds enterprise-level Smalltalk products they've been doing it for decades to give us some of that magic so this is actually so gemstone s is a product that gemstone the company a little confusing has been working on for 15-20 years now it's essentially a distributed small talk 30 the company has been in existence since march first 1982 but yeah they're they're old school like ponytailed Smalltalk guys there they're great so what is gemstone s so it's a well-tuned virtual machine to run small talk it's battle-tested it's a it's been used by the examples are you know JP Morgan Chase used it to do their financial modeling national Slovenian gas company they've they built their whole entire billing system and they've survived all sorts of huge changes in the deregulation of the European energy market and the introduced introduction of the VAT tax so when I was talking to Dale last week he asked me he said how much do you think about saving your data instead of using it so that's actually the cool part of what I'm going to talk about which is maglev maglev is a ruby implementation that runs on top of the small talk the gemstone s virtual machine there's a lot of power in this virtual machine it all comes from something I'll talk a little bit transparent object persistence that bit a small talk that's really fantastic that we don't have in Ruby well we didn't have in Ruby so this is usually what we build when we build our web apps we have a bunch of smiley faced users on their phones maybe have some front-end web site maybe some admin any other type of back-end apps they talk via an API and we store it in something like Mongo or postgres a lot of times what we end up building though because that's kind of difficult is something a little more simple where maybe we cut out some of the stuff and we don't have a central API we have our apps talk directly to each other I think of all the duplication and all the code you have to write for that it's quite a bit thinking about those systems a year and a half ago I had an aha moment I was like wait a minute transparent object persistence if all of my objects live in one space means I don't have to write ap is for my system to talk to each other this is the kind of thing you can build with maglev I like to think of that cloud is actually a warm sticky ball of dough your objects just stick inside and float around kind like chocolate chips when you're making cookies so you don't have to couple your apps together using these external services maglev actually becomes your API and it does that by this transparent object persistence so last night I was driving up from Portland and there was ton of traffic so I was listening to some podcasts the Ruby rogues podcast is done late June early July on domain driven design with a David Larrabee about halfway through Abdi grim one of the Rogues says this quote and I'll read it for you it's a little long I've often wished that I could just do something rights just change a bunch of object or change a bunch of models and at the end of the request cycle something magical goes through and collects up all the models that I have changed and persist them and I haven't really gotten to that level yet sometimes I feel the active record is holding me back I could not believe that he said that like I didn't throw up my hands i wanted to because I was driving and I was like we have that we have exactly what you just asked for we've actually had it since Halloween but the tech not but maglev has been worked on for quite a bit now of course there's a punch line to this right so what was the following comment from David Larrabee well you can have that you just have to switch to Java or.net and I was like seriously no you don't I mean that would just that's a world of hurt right there luckily we don't have to enter into that world so maglev has the idea this transparent object persistence it's all based off of that persistent route basically what that is is a hash you have keys just like a ruby hash you can put anything in it symbols strings and then you have a value that's tacked on so if we look at a simple example think you'll kind of get what I'm talking about so ignore the abort in the commit real quick what I've done is I've created an object you see that it has an object ID of course and I tack it on to the Jesse key of the persistent route now in a totally different instance of my application running I can say hey let's get a fresh view and my object is just there that's pretty powerful so what's actually happening here so gemstone s has a notion of a stone the stone is where your data your objects are actually persistent then you have confusingly what they call gems which are instances of your ruby program they could be running on the same machine they can be running on thousands of different machines it doesn't really matter as long as they're connected to the stone they will see the same information so gemstone s is actually acid transaction base so in order for one virtual machine oh sorry i'll call it a gem one running one running gem to get a snapshot of the data that's been stored at that time you say ok let's abort a transaction that gets a fresh view of the data now you can make any changes you want and all you have to say is commit transaction and all the nice stuff that we expect with you know acid and durability and consistency it's all taken care of for us all of a sudden our changes are up in up in the stone and any other vm that connects once it aborts it'll see those changes so now there's something in gemstone s and in maglev call persistence by reach ability so the notion of persistence by reach ability is that I don't have to persist every object in my graph so let's look at a really simple graph I have this class called a cat collection right and right now the cat collection is holding my cat Pierre now notice that I'm actually tacking I'm adding or I'm sorry I'm persisting the collection itself whoa that was wrong I'm gonna sorry about that I hit the wrong button ok I will make sure not to hit end anymore so on another vm you can see that I abort the transaction and I grabbed the persisted cats and the first one like you'd expect that of just any normal array is Pierre I didn't persist Pierre I persisted the collection that Pierre was in but he was persisted also that's persistent by reach ability so now I can also persist things that are related to peer himself he likes toys really he just likes balls of yarn so there we go we give him a ball of yarn and he's in there the next time a nap fires up persistence by reach ability we get that ball of yarn back so let's talk about data structures for a second what can we ease alee persist today using the databases that we use gotta raise basically relations sets relations with constraints hashes post grasses age store that's pretty cool Mongo counters maybe you got a lock a table suppose you could use Redis for that sorted sets now you're kind of running into constraints and ordering maybe what if your app has something more exotic something like a KD tree or a bloom filter what if you mess around with Judy arrays or leftist trees anything that you come up with can be persisted you can persist prox you can persist bindings that's pretty interesting you can't persist in I oh but you don't really want to anyway there's a few things that you wouldn't want to persist so I'm gonna run through a blog I whipped up this morning literally this morning basic Sinatra application I grabbed the blog out of the the hash and then I grabbed all the posts this could be normal active record stuff what I want to look at a post I'm just using the object ID as the ID kind of makes sense doesn't doesn't really matter so that looks pretty simple that's my query there is no sequel there is no JavaScript there's no nothing else it's just Ruby when I want to add a post okay postdoc class is nasty i know this was this was done quickly again i can just create a post i can tack it on to my the posts of the blog and it's just saved it's just just persisted it's persisted using this maglev transaction wrapper which is actually in the maglev repository you can see all it really does is if if the status comes back 202 399 something that wasn't a failure let's commit the transaction we always abort coming in and then we make a decision whether or not we want to come on the way out well it's pretty simple that's exactly what Abdi was talking about that's exactly what he wants and like it's one it's one rack middleware it's pretty slick so now let's look at the Maglev blog itself it's a PO row it's a plain old Ruby object hate that it's just Ruby okay just as you'd expect you gotta post there are no fields in a database there's no you know saying what the type is with mongoid same thing with authors extend it to comments all that kind of stuff and the way I bootstrap this was I had this little script that said hey maglev persistently load the file so that in the future all of my posts and my authors can be committed can be persistent and then commit it and there we go I had a blog no external database it actually works so let's talk about something a little more useful than my shitty little blog background jobs last year at a I kind of did a quick spike on a background job processor and maglev so this is kind of what we think of as our background process and Q's right you have some type of a collection we want to put something into it and later we want to call so let's do that maglev we aboard a transaction we have we set an array we push prox into the array then we commit the transaction you head over and over again there's our background processor that's the producer sorry that's not the actual processor but that's pretty simple right on the worker side this could be a instance of the program running on a totally different machine all we have to do is abort the transaction grabbed the first thing out of the array commit the transaction call work that's it now at a few issues with this one what happens if the commit fails what happens if to insistence actually grab the same job also that arrays as they're built into gemstone s aren't concurrent data structures and I'm sure there are other issues with this but those are the two fun ones so we can simply rescue this commit failed transaction that takes care of adding stuff in this is the producer on the consumer side do the same thing we say hey if somebody's already grabbed this job then let's just try again let's just let's just redo that's super simple right so we've taken care of what happens if if something happen if I commit fails so now arrays aren't concurrent data structures well gemstone has a notion of concurrent data structures there's four of them there's small talk versions these are the Ruby versions they called reduced conflict cash reduced conflict classes so we have a hash a queue a counter and then something called an identity bag which is a small talk thing we don't really use those counter and identity back are not actually exposed right now but hash and QR so what are these reduced conflict classes well if we think about a reduced conflict hash it's the big hexagon every client that connects to wants to write to it it gets its own little hash that's where it writes but anytime it reads it's reading from the sum total so this allows us to write as much as we want to this data structure and then when we read it it's the sum total so we can use the queue for that reduce conflict q now we've got a concurrent data structure and it's exposed it works pretty much the same way as the array so we've taken care of that and you can see a little bit more here grab a lot of show notes stuff showing on the talk about but what's the point the point is that this was 34 lines of Ruby code that's including something that clears it okay if we're getting paid peanuts was it 56 thousand dollars a year hopefully we get paid more than that this may be would cost an employer let's say two grand 3 grand okay you probably cost more than that just to implement rescue or maintain sidekick don't get me wrong I love those libraries they're great so maglev it stands now targets 187 passes a whole bunch of Ruby specs but it still needs some help path to pass more Ruby specs to really get it up to 187 it does run some see extensions but because it's largely based on small talk there's quite a bit of work to get some of the extensions to work there's things like our specs described class we need some need to figure out why that's not working and I oh right now is not quite exposed but it is but there's some love that needs there but it does run things that we depend on nokogiri will install it's a special version but it will install bcrypt will install json will install you can use these libraries it's really there's no reason at least not to play around with maglev so what's the future of maglev well I hope that it's bright I hope that people use it turn 11 soon gemstone will be releasing gemstone s3 101 right now maglev is based on the 310 virtual machine there's some known issues with that so the three 101 release will address those and that should be coming out pretty soon hpi the Institute in Germany they have a class project to get enough 19 running okay let me back up for a second hopefully the students will pick and most likely six to eight of them are picking to work on maglev in order to get enough 19 running to run rails for so that's pretty cool that means you guys will actually be able to stay in your rails environment use most things use some of the 19 syntax to be able to build some apps I would love to see what you guys come up with okay the blog example pre cheap but it worked the Q example kind of slick for shits and giggles last year I started reimplemented Redis because if you think about it redis is just a dumb down version of maglev api is just strings his keys that's up on my github and check it out I think there's some really cool things we can do it's easy to install maglev rvm Ruby build if there's and then you can just install it from the install dot sh script in the repo I think about a couple minutes for questions if anybody has any yeah way in the back hi you sorry oh the performance of maglev so the question was what's the speed of maglev I was going to include benchmarks and I had the slides all right now and everything and then I decided not to it is perfectly fast enough it okay what do I what do I mean by that so I ran the KD tree examples in the in the repo and it course outperformed 187 in cases it outperformed 193 and I didn't compare it against JRuby but it also did pretty well against Rubinius I don't think that I think there's been very little to no work done on performance and it keeps up quite well besides just that it has transparent object persistence which while it may not run your code really fast it will probably save you time in other places yeah yeah yeah yeah Alan notice he's yeah he's incredible like he's really easy to talk to he's somebody who's been implementing really fast small like you guys know that Java's fast because small talk right like the strong talk optimizations I don't know a lot about it I do know that right job is fast because of research done to make small talk fast small talk is fast enough and like Abdi said we're sorry avi said back in 2007 there's no reason why Ruby can't be running as fast a small talk it's basically the same language yeah so how does gemstone s make sure that your data is is durable how do i I don't know the exact okay so every transaction that comes in there's a log written to the transaction log they've been doing it for so long I don't know the internals of the virtual machine it just does yes right yeah but that was probably don't even know I'm not sure when that was so on the seaside website they have the they show some of the tiered examples the community version you can store as many objects as you want it's just not all of them will fit in a memory I'm a community version you can use I believe two CPUs and you can have two gigs of what's called the shared page cache and that's kind of the in-memory pool that your objects can fit into so if you think about two gigs of objects that's a lot of objects those are just the ones that fit into memory any of them that don't get used or don't get used as often will get persisted to disk and then pulled back out when necessary you know i suppose but i would i would like to see somebody build an app that gets past that point monty at gemstone you know when I asked him about that he said that JPMorgan Chase they pretty much don't go over the 2gig pick shared page guys shared page cache are we out of time okay thank you you you