How NEO4J Saved my Relationship

0 0

little nervous when I found out they're gonna be 500 people here and then I learned that one person couldn't make it so for 99 and that's kind of reasonable so so I am Coraline 8am key I'm very happy to be in UK you may not know this about me but I'm actually a an amateur ukala gist I've been tweeting all week since I got here with the you ecology hashtag arm for the benefit of my colleagues back in America to share some facts with with them about your country that's absolutely true I learned to sew I am a rubios I've been doing Ruby since 2007 and the author of about two dozen rubygems you can learn all about my work at where coraline codes which i think is the most awesome vanity URL ever i'm on the twitters as crow on ada as mentioned i'm the author of contributor covenant which is the most popular open source code of conduct in the world with over 14,000 adoptions including rails jruby and elixir you should take everything i say today with a grain of salt since i'm speaking from the perspective of a liberal progressive transgender feminist to hold controversial opinion that people should not be shits to each other online so you've been warned so we're going to talk about databases today arm we have a lot of options when it comes to selecting a database but in the rails world especially we always seem to reach for post grass and Redis if we're feeling really experimental so why is that um traditional related relational databases were designed to model paper forms in tabular data you can easily imagine a table in a relational database being populated by form in a GUI but who says that tables are the best metaphors for how we store our data we have a lot of options when it comes to databases and a lot of diff kinds of databases that are disposal and the one I want to talk about today is graph databases so we'll start by answering the question what is a graph database graph theory was pioneered by Leonard you are in the 18th century and has been actively researched and improved upon by mathematicians sociologists and other scientists in the intervening centuries / it's only in the past few years that graph theory and graph thinking have been applied to information management part of it's driven by the massive commercial success of companies like Google and Facebook and Twitter who use proprietary graph technologies to drive their solutions but in recent years we've also seen the introduction of general-purpose graph databases into the technology landscape so what kind of data is suited to a graph database data with context data where in order to understand the context we need your name and qualify the connections between things so in a graph database there are two kinds of things that we're dealing with we're dealing with nodes and with edges nodes are entities and if you insist on thinking in a relational terms you can think of a note as a table except that it's a table that can hold absolutely anything it is object agnostic and edges or relations relations are named and they're directed so they always have a start and an end node nodes contain key value pairs and relations can also contain key value pairs you can stick metadata on relations relations in a graph database our first-class citizens which means that we can actually query on them unlike join tables which are not easy to query so why would you choose the graph database for intensive data relationship handling graph databases have superior performance over relational databases with a traditional database query performance for relational data worsens as a number of records and the depth of the relationship increase but with graph databases performance stays linear and constant even as the data grows the structure of a graph model flexes as our applications change and grow graph databases are schema-less so that means that you don't have to mess around with a lot of migrations and their object agnostic they don't care what kind of data is stored node graph databases are well suited for agile methodologies because we don't have to model the domain completely ahead of time and we can change our mind we can change the existing graph structure without modifying the data that we already have or the functionality that we already have so there's some typical use cases um which I'm not going to get into right now I'm going to I'm going to do this it wouldn't be a graph database talk without talking about modeling a social network it's actually a requirement and I don't want to lose my license so if you'll bear with me in a relational database we might model Twitter accounts this way where there's a name and a handle and of course an ID associated and we might have a joint able for followers with a follower idea person ID and maybe some metadata like when they followed in the graph database we use nodes and edges so nodes would contain users and edges are the relation between users so we might have a relation called follows and we could stick the followed on metadata on that relation but when two or more domain entities interact like Coraline Ada and someone else interact on this example facts emerge relations and metadata are facts we can represent these facts as separate nodes with connections to the entities engaged in the fact so we might actually model it like this with a follow node where we have a relation between a person and a follow and a follow and another person once facts are represented as first-class citizens we can ask questions like how many people did someone else follow on December 31st or who is someone else been following for less than a month notice that we're not interested in the kind of thing that sword in a node the graph database doesn't care so we have things that can be described as users we have things that can be described as follows and another thing that um we describe as relations to different kinds of relations and this is different from relational database where we'd have to sort each of these things separately so I want to talk today about neo4j which is a specific graph database it was created by a company neyo technology neo4j is open source it was written in Java interestingly the native interface for neo4j is over HTTP and we're going to see an interesting side effect of that fact out in a little bit neo4j was released in 2010 it comes in three editions the community edition is free to use but it's limited to a single node Enterprise Edition gives you clustering and hot backups and monitoring and there's a government edition which is like Enterprise Edition but comes with some additional certifications neo4j is acid compliant which means it gives us Adam is City consistency isolation and durability guaranteed reliable database transactions unlike some other alternative databases neo4j stores graph data in a number of different store files each store file contains the data for a specific part of the graph the on division of storage responsibilities between these different files facilitates very highly performant graph traversals so the node store is a fixed size record store where each record is exactly nine bytes and links fix Hydra fixed size records enable fast lookups if we have a node with an ID of 100 we know that it's record begins 900 bites into the file based on this format the database can compute a records location at 01 rather than having to search at a log n excuse me so the first bite of a node is the in-use flag which in the case of the database if there's data associated with this node or if it can be reused the next four bytes are the ID of the first relationship connected to the node the last four bytes represent the ID of the first property for the node the record is pretty lightweight it's just a couple of pointers the relationship store consists of also fixed size records in this case each record is 33 bites on each relationship record contains the IDS of the nodes at the start and the end of the relation a pointer to the relationship type and pointers for next and previous relationship records this lasted a pointer is a part of what's called the relationship chain so it's very efficient in its storage and it's designed for high performance operations and fast queries so how do we query in neo4j we don't use sequel we use a language called cipher and I'm going to walk through the crud verbs to show you what cipher looks like so our first operation is they create we're creating a graft person a with the name of coraline and a handle of coraline ada will also create a node b which is the graft person with the name of bath ruby and the handle of bath ruby with no spaces then to create a language a link between the two a connection between the two an edge we match a and B where a's name is coraline and b's name is bath ruby we create a follow relationship and for the purposes of illustrating what we created will return both nodes with the follow relationship in between the thing that I wanted to point out with that particular query is that we're actually using ASCII art to draw a relational to draw a relation between a and B is an arrow because on because they're there directional we can actually draw which direction the connection goes on which is pretty cool so a read operation we're going to match note a that follows node b where node a is name is coraline and will return note a the follow in all instances of be for an update operation we're going to match node a with the name of coraline we're going to set a name to code which and we're going to return node a delete is a little bit more complicated we'll match node a where a is name is coraline we're going to detach and delete a because a note has to have a start and an end because an edge has to have a start and an end node we can't just delete nodes we also have to detach from all existing connections all existing edges as part of that delete operation which is what the detach does so you're probably used to a database query interface it looks something like this for post grass what kind of console does neo4j give this is pretty cool this is where that native HTTP interface comes in handy neo4j is console is actually a single page web app let's try a friends of friends query against our sample Twitter data set and see what that looks like so we're going to match a follower which is of type graft person with a connection that we're going to call follows to a depth of two connected two nodes user where the name of the user is the lure and because we don't want to turn a whole bunch of records over when we turn the follower the follow relationship and the user and will limit it to one hundred records here's the cool part check that out isn't that awesome so we get a graph because our data is graphical why not get a graph back so we can see the relations between the things we can see the the node we queried at the beginning in the center of the graph and then the immediate followers and then the followers of those followers which is pretty cool you can drag them around you can double click on them to see all the metadata that's associated with with a node as well which is pretty cool and let's try it again with friends at a depth three and check out the performance so I just tapped up and entered the same query from history and change the depth to to depth three run it again and it's still super fast awesome so we're rubios so how do we do neo4j in Ruby we use a gem of course neo4j de RB which is an active model compliant arm for the neo4j graph database because it's based on because it's active model compliant if you know active record you can get around pretty easily in neo4j de RB so let's take a look at some code this is um again from our Twitter example we have a graft person notice that we're including neo4j active node so unlike active record this is not an inheritance situation which actually like better we're adding behavior to our class we can define our class however we want and we don't have to say that our class is an active record so i like i like includes better for that sort of behavior we declare a couple of properties so this is a declarative schema we declare a name and we're going to index on name because we're going to be searching on it and a handle record and we define our relations we have many outbound outbound edges we're going to call them followers and we're going to relate to the class follower in terms of our edge we're including neo4j active rel I hate the fact that it's active rel instead of active relation i think abbreviations encoder of the devil but we mark the from class graph person to class crafts person and the type or the label for the relationship is follows you could add metadata properties here too if you wanted to so in that graph person we had a method called friends to depth so this is what it looks like when you build a cipher query in your Ruby code we're going to query as W so this is just a handle to the query we're going to match a follower with a followers relationship to a depth that we specify of a user with the handle of our handle we're going to skip the direct relations and just go to the nested relations enroll return distinct followers by handle and we are mapping to follow her and I'll show you well the reason we Mont map follower is first of all the queries are lazily evaluated so when you call that you actually get a query object back and when you do evaluate it you get a struct and destruct will have follower objects inside of it so if you actually want to get at the records you need to do a map so let's see the next in from the repple so we'll take a graph person just the first one and assign it to GP and we'll call GP dot if I can type friends to depths and we'll specify a depth of 2 and there are graph people and we can do DEP three which is fast and even depth for which is super fast that's pretty cool the speaking of performance how does the performance of neo4j compared to relational database again if we're doing this friends of friends query there was some research done and this shows the depth on the very left these time in seconds for a response from a relational database time in seconds from neo4j and the number of records returned so at a depth of two returning 2,500 Wreckers performance between a relational database and a graph database is about equivalent but you can see that the relational database gets worse and worse the greater the depth we go to now to be fair post grass has added recursive queries which improve the performance of queries like a friend of friend query but neo4j is still significantly faster so i have shown you the requisite Twitter graph the social graph as an example of how to use neo4j but neo4j can actually use for some pretty advanced data modeling as well and I'm going to introduce you Sophia which is my artificial intelligence side project Sophia features natural language processing with semantics as well as grammatical mapping my goal in designing Sophia is that she will be able to comprehend and even create metaphors in short I want her to be able to dream so how did I model data in Sofia core abstract concepts are called context so in my management application here are some contacts like animal and beauty and color and difficulty so if we look at specific contexts like beauty you can see we have these roots roots have a base form roots are basically an expression of a context they have a base form they have and they have grammatical forms so if we look at one of the roots like beautiful it's really easy for oh they also have a positivity which is if there are positive expression of a context like temperature being hot would be a positive expression cold would be a negative expression so because of the positivity ranking on them it's really easy to derive synonyms and antonyms just by comparing those those rankings so we have a list of synonyms here and some antonyms um if if a if a word matched exactly the positivity ranking of this word then it would show up as related interns the bottom we have parts of speech so the concept of beautiful can be expressed as an adverb beautifully or as an adjective beautiful and we have some metadata associated with those parts of speech so beautifully modifies manner and it's a verbal adverb adjective is come has a comparative form which is more beautiful in a super little form which is most beautiful so how did I model this in neo4j we'll do a query where we look at a context and its roots so the core concept and how it's expressed we're going to match node a which is a gramercy meta context with the name of beauty with a relation directional to node b which is a Gramercy meta route and will return the context the relation and the route and will limit it to 100 records so you can see hopefully you can read that we have beauty in the middle and each of those things radiating off of it is a root now let's go from context and roots all the way down to parts of speech so in this case we're going to match node a which is a gramercy meta context with the name of beauty with a relation directional to a gramercy meta root and we're going to go one layer deeper in our relation with a directional relation r 2 to node C which is a gramercy part of speech generic and we're going to return all of the node simulations that we referenced so this is a simpler graph because we're just looking at roots for the word beauty so we have a couple of roots hanging off of that with grammatical expressions as the green nodes the Red Nosed are the roots and the orange notice the context Sophia also understands izza and has a hierarchies which are a way of modeling facts so I have this fact Explorer this is a list of facts that she knows she knows that a cat is a mammal that a cat has fangs and a tail that Elfi is a cat elfies adorable I wish had a picture of her and that animals are mammals that have fur so we can query facts using natural language the partners were asking a question and the context of the question is animals it figured out the subject is cat the verb is have in the predicate is fangs and the answer to does a cat have fangs is yet because a mammal is a living thing in an animal in this case two concepts two contexts come up living thing and and the fact that a cat has a tail and a cat is a mammal lets the partner know that some mammals do have tails like cats for example we can also add new facts by making declarative sentences Elfi is cool and responses i'll remember that if we take a look down at the bottom at how it parses sentence it knows that it's a statement subject is Elfie the verbis is a predicate is cool for the context she knows it's an animal because Elfie has already been expressed as on the fact about LPS that she's a cat and the cat is a mammal and a man was a living thing but Sofia is not able to determine what the word cool means cool as an expression of temperature but also of disposition there's not enough context in this in this sentence or to distinguish between the two if we had additional words like she's cool and interesting than Sofia would understand that we were talking about disposition and again we can ask questions that are on that required in France so is a cat cool because we said that Elfie is cool and alfie is a CAD Sophia knows to say that some cats are in fact cool so how is it is a hierarchy structured let's look at another query we're going to match node a which is a category the category is just a note that can that has relations to multiple nodes which are called objects with the relation f2 and is a category with a relation f-22 an is a component so component is something that makes up an object and um objects also have characteristics which are things that describe them so we can see that cat we can see how cat mammal and animal are connected we can see that cats have tails and fangs and that mammals have fur so that's my AI project Sofia so for Sofia a graph database was an obvious choice for me because of the complexity of the data that I was modeling and it's paid off quite well but when should you use the graph database if your relations are complex when the metadata around a relation is just as important as a relation itself if your data is deeply nested when performance is really important to you because again graph databases exhibit linear performance characteristics when you want to move very quickly because graph databases are schema lyst they allow you to change your mind quickly during development processes and let's be honest use a graph database if you like bright shiny things we're developers were attracted to bright shiny objects so we should be honest about that so give it a try um it's a quick download installation takes several minutes so you might want to grab a cup of coffee or tea there's a really good documentation neo4j ARB gym is very solid the fact that it's activemodel compliant means that it's easy to get up and running so new tools encourage us to explore and to play and personally this is how I learn best by leaving behind my assumptions about how things are supposed to work and finding new ways to make them work from my perspective this is one of the best parts of being a developer being constantly encouraged to learn and to grow so give graph databases a try even if you don't end up using one in production you will challenge some of your assumptions about data modeling and picked up some neat new tricks along the way hopefully I've inspired your curiosity and you'll go ahead and try it out thank you very much you