Dōmo arigatō, Mr. Roboto: Machine Learning with Ruby

0 0

first thanks so much for coming to this talk you know it really means a lot to me that you're willing to spend your very valuable time here so you know thank you and thank you to the conference organizers to Cincinnati to everyone who is speaking and attending and doing everything that makes this community fantastic I'm super glad to see you here yeah yeah give yourselves a round of applause you guys great so this talk is called domo arigato mr. Roboto machine learning with Ruby how's it Ruby kegui a couple months ago so I guess it's actually you know domo arigato but my Japanese is non-existent so I will say domo arigato and just apologize for it yeah and this talk is for my younger brother Josh who passed away unexpectedly this summer all right hai part zero this is a computer talk so you have to start with zero I tend to speak very quickly more so when I'm excited and talking about Ruby and machine learning is super exciting so I'm gonna try to slow it down and go at a normal pace but please do you know if you hear me kind of going off the rails a bit you know wave or some kind of signal something just to comment dial it back just make it big so I can see it from here because it is super bright up here which is awesome but I feel like I'm staring into the Sun oh I'm feel free to shout - that's that's also fine I'm going to talk for about 35 minutes and then we'll have some time at the end for questions it's funny I I gave a talk last year at rubyconf on garbage collection and I was all set ready to go talk was all buttoned up I'd finished my slides multiple days before which never happens um and Matz came in and sat down front-row Center like right before I started and I blew through the entire talk in like 25 minutes or something like that so I will try not to do that here have been actually practicing all my talks now like imagining Matz is in the room which I don't think he is but if he does show up I'll be prepared so yeah my name is Eric I'm a software engineer slash manager at Hulu which a friend of mine has cheerfully called Netflix with adverts which is not wrong so you can you can find me on github Twitter etc etc in this weird human hash that I felt the need to make I write a lot of Ruby and JavaScript for work even a little bit of go which is nice and my side projects tend to be Ruby or closure or lick sir I'm actually also a newly minted hydrous contributor so if you haven't heard of interests or you're wondering what it is come find me after the show it's it's a lot of fun I've been writing Ruby for about five years ish and about a year ago I think I wrote a book this book called Ruby wizardry which teaches Ruby to eight to twelve year olds so if you're just in that also come see me I'm happy to talk about it I'm out of stickers but we do have a 30% off promo code from the folks at no starch so thanks also to them at any point this week if you want to buy the book online if you go to no starch calm and use that promo code you'll get 30% off cool this is not a long talk but I think we still benefit from having an overview of where we're gonna go so I'm gonna talk a bit about machine learning generally a bit about supervised learning in particular and neural networks in particular and even more particularly than that machine learning with Ruby on the amnesty data set which we'll talk more about in a second but first machine learning so show of hands how many of you feel very comfortable with machine learning like or even you could say pain like 15 seconds or a sentence I could tell you what it is okay Moorhead's great what about supervised learning cool what about neural networks okay cool as interesting it was like a weird like difference there some people are I don't know anything about machine learning but neural networks yes which I thought I thought was pretty cool um cool all right so the good news is if you didn't raise your hand you will still be fine this talk is sort of introductory you do not have to be a mathematician to do machine learning it does help if you happen to know like high school level stuff like you know basic stats are like first-year calculus or like linear algebra it helps a lot with understanding the way that the machine learning algorithms work and sort of how everything is put together but it is not necessary to know those things to use the tools we're going to look at and it's not necessary to understand the content of this talk so that's good so this is um what I think of when I think of machine learning and AI and and robot stuff this is actually like a dumb little dry can I did for the Ruby book but this is probably not what you guys think of when you think of machine learning or probably not robot pirates so so what is it right I guess if I were gonna pick one word to describe machine learning to explain what it is I would say that it's generalization the idea here is you're going to get a program to assemble rules for dealing with things in the world with dealing with data in such a way that it no longer has to be explicitly programmed in order to make generalizations and you know what do I mean by that well one thing that you can think of this as is pattern recognition so you know maybe we go outside now you say okay that's a card and that's a car that is not a car that is a car and we do this for a while and then we go to another part of Cincinnati or we go to Norway or go to the moon and I say okay is that a car is that a car and the idea here is to see if you sort of generalized right you don't just have like a list of car items things that are cars and if it's not on the list you don't know you sort of built a concept of a car you have a sense of carnist that you can kind of you know look at the world with and this sort of car filter will tell you you know if you think something is a car or not so we we sort of want the machine to tease out these underlying patterns in some data set and group them or cluster them accordingly or make a prediction about what they are and this first thing I've talked about this this notion of kind of making predictions based on labeled information the idea of having you know a list of this is a car this is a car this is not a car this is supervised learning that is you can perform classification or regression based on some data set and you generalize from the data that you know what those things are to data that you haven't seen before so in terms of classification and regression classification is very much like that car example right is this a car or not regression is you know for example I think the canonical one is housing data right maybe you have a plot that says for some given feature like square footage or proximity to a good school here's how housing data change with that feature and so you might have this kind of scatterplot and say ok I'm gonna put a new point down and how much should this house cost and if you've ever done like linear regressions or like lines that best fit or you like a cloud of points and you have like a line or a curve that sort of fit the data that's the idea here is you're sort of performing function approximation you're finding like a curve that kind of explains the data and I think that's another key attribute ears we're explaining data we're finding patterns that human beings can't necessarily tease out on their own so in order to do machine learning we have to think in terms of features and labels the Emnes data set we'll talk more about but it's a database of tens of thousands of handwritten numbers and so the features here are actually raw pixels and I'll have a visualization for that but the idea is you have these sort of you can think of an image as like a vector right like an array of intensities if it were purely black and white you could almost imagine this like ASCII art thing right where it's like zero zero zero when it's blank and then you have one where there's a black pixel yeah and you can kind of unbind this across the line breaks and have like this one big vector and so those are our features this sort of like you know list of pixel intensities and the the labels that we have for these features are digits the numbers 0 through 9 so we need to first divide our data along those lines identify what the features are what the labels are and then we need to divide the data into a training set and a test set so the training set is the data these are the data that the machine learning algorithm will operate on and we'll learn about and then there's a test set the machine will then look at and make predictions about and in machine learning it's generally considered cheating to feed the test data to your your algorithm because you know it's you're not necessarily proving that you can generalize if you are testing the machine on it the exact instances that it's already seen but you know maybe you work for like machine learning shop and one of your products is making recommendations or something like that and your test data come in all the time maybe then it's reasonable to say okay last year's test data let's turn the machine on that let's see if that gets any better we'll pick some new test date and we'll keep going so that's another thing I want to kind of call out this notion of memorization as opposed to generalization and generalization is the goal and like I said these digits 0 through 9 these are the labels and this is what we're going to try to predict given some image that represents a handwritten digit is it a 0 a 1 2 etc and this is sort of what I mean by a vector of intensities right so we might have a handwritten image this gets sort of mapped to a vector where you have zeros for the absence of colored in pixels and you have some small real number value or some value for some nonzero value for the pixels that are sort of lit up cool so we've talked a bit about machine learning what that is and in particular supervised learning which is sort of you know labeling unknown data based on data we've seen before but what's a neural network it's a machine learning tool modeled after the human brain which is not super helpful it kind of reminds me of that talk I think Aaron gave on virtual machines there are machines that are virtual right this is a this is a network that is neural so let's let's start with artificial neurons right so human biological brains are made up of biological neurons and in much the same way neural networks are composed of perceptron or neuron units right and this is a very simple little neural neuron called a perceptron and the idea here is you have this thing that's sort of almost like a function right so a biological neuron has a bunch of dendrites right synaptic messages sort of come in through this like little branch II thing so on the Left these sort of like inputs and they go into the body of the neuron and what comes out what gets transmitted along the axon sort of the output is you know the synapse you know it sort of fires or it doesn't then you're on fires it or doesn't and you can model this as sort of a function right and you can think of these dendrites as vectors of signals and weights like what is the signal how important is it is kind of a way of thinking about it and the axon is the output and most perceptrons will threshold so you'll set some little threshold saying hey if we're above this value one we fire or we're below at zero we don't there are models where you just kind of pass along a real value as opposed to Thresh holding but in this particular example you can see that kind of like step function symbol indicates that there's thresholding happening and during training what happens is we sort of initialize all these neurons to have little random weights and then we start looking at the data and as we train we are told you thought you know that this was the number for it's actually the number seven and so for each epoch of training you know each iteration of training we're going to use a feed-forward neural network meaning sort of we feed all they do through there are no cycles inside the network and then we do something called back propagation the idea is you take a stab at the data you see where you screwed up and then you propagate that error signal back through the layers of the network kind of tinkering with weights as you go figuring out okay if I change this weight by this much now I'm going to call that a for when it's a for if I change this weight by this much now it's gonna be a three when I say it's a three and we keep doing this over and over and over either until we get tired of it which happens sometimes or until we hit some you know predetermined threshold or we say alright our error is low enough that this is fine maybe we don't perfectly categorize every single thing in our training data but we get like 99% of it right and that's good enough like I said we initialize these weights to small random numbers for two reasons actually one if you don't know what the data are going to be there's sort of no sense biasing it by picking values that you know it might introduce some kind of error in the data but also when you have very high weights you tend to overfit and we'll talk in a minute about overfitting but the idea here is you don't want to sort of believe your data too much if that makes sense cool so we talked a bit about perceptrons and this is sort of how you might organize perceptrons or artificial neurons into a neural network so generally they they look something like this where you have an input layer that corresponds to the sort of size or shape of the number of features that you care about the hidden layer which is kind of a hyper parameter you can tune that's a fun word that I like to say um the idea being you can kind of turn the knob and say I'm gonna have 100 neurons in this hidden layer uh 200 okay 50 you know I just kind of move it around until you see how your neural network behaves on the data one downside of neural networks is that they're notorious black boxes you generally don't see the weights that the network assigns and they wouldn't mean much to you if you did so it's not really clear what's going on inside this little learning machine but you can turn some levers or turn some knobs and pull some levers and tuning the number of hidden nodes is one of them and then finally you have an output layer and the output layers size the number of neurons there corresponds to the number of labels you expect to have so if you have denoted zero through nine you'd expect to have ten since there are 10 possible labels that might come out there are other things that you can tune in neural networks we're not going to talk a whole bunch about them but I'm happy to chat about it later things like the learning rate like how quickly the the machine is actually learning if you think of minimizing error as sort of like a surface and you want to like get down all the way to the bottom you can imagine almost having this kind of mesh right you have like a like those cartoons they show you where they try to show you what gravity looks like there's that you know you can imagine trying to get to the bottom of this if your learning rate is very large if you take large strides toward that bottom you could potentially kind of kick around and bounce inside the bowl without actually getting to the bottom so lowering the learning rate to a smaller number means you train longer and it takes more time but it does mean you're less likely to sort of bounce around looking for that that minimum error that that global minimum and if you've heard of deep neural networks those are neural networks with more than three layers which is kind of kind of lying to you but basically you know instead of like one hidden layer you might have 10 or 50 or 100 and all these papers that you're seeing now from Google and folks like that who are working with deep neural networks are building these very elaborate architectures which I know a tiny bit and I'm happy to chat about but if you have experience there I'd like to talk to you after so cool so that sort of covers the theory behind machine learning and supervised learning and neural networks so now we can move on to looking at the data so we're going to look at the Emnes data set we're going to use a library called the Ruby fan gem which is Fannie's FA and n fast artificial neural network which is written in C and the Jim are a bunch of bindings to to that library and then we'll talk about developing an app that lets us actually take advantage of the network that we've trained and say ok now that you've done all this training is this a - is this a fort is this a car or five and so on so our data as I said are images of handwritten digits and they've all been sort of sized normalized which means they're all the same size and centered so you know you're not gonna have to worry about sort of convolving or scrolling over the image to figure out what's there there are you know deep neural networks and neural network architectures that can kind of like lit a little magnifying glass like go over an image and find stuff anywhere but this is much simpler so we Center the image for the the machine so doesn't have to go hunting around for it there are 60,000 examples in the Emin estate asset and 10,000 are for training so effectively we're going to take tens of thousands of examples run them a bunch and then we'll have 10,000 tests is this one is this a 2 etc and the data set is open source it's available online so if you go to this URL you'll find it if you search em inist you'll you'll find it and when I tweet out and share the slides you can follow all the links there too this is a sample of what these data sort of look like as you can see like there's some kind of loopy sideways weird looking zeros there's curly twos and kind of square bottom twos and one that looks like a Z you know this is just samples of human writing so some of these I think you know even some people might have trouble with um but the idea is you know the machine can tell ok it's a closed top for it's an open top for it's still a for you know it's a nine that's sort of curly or it's a you know circle and a line it's still 9 so I went ahead and I trained a neural network on the eminence data set and I've done this a few times and I wanted to see how well we could do right so we'll take a look at the the Ruby fan output in a second but the idea here is you know for some set of parameters like number of epochs I picked a thousand it never needs a thousand because it gets to the minimum error that I set for it before then I'm sort of working over these data like what's the best that we can do and so it gets at some point I think like ninety-nine point nine nine some odd percent of the data that it trains on correct which is what we want but then on the test date it only gets 93 percent correct so we have 93 roughly ninety three hundred correct about six hundred incorrect so about ninety three point two eight percent which is good it suggests to me that we're not overfitting too badly and I mentioned overfitting and I want to talk a bit about that the idea behind overfitting is that you're believing your data too much right you're sort of modeling not just the underlying features like the actual information that you want but you're starting to model quirks or noise or random fluctuations like things that you actually don't want are becoming something that your your model believes strongly are like things that you should care about so noise can come from a lot of places it can come from sensor fuzziness it can come from humans mislabeling stuff it can come from sort of unmodeled things that you're not thinking about that are subtly employed influencing the data but as you become very very tightly fit you'll start to see your generalization go down and we can talk about after but they're these kind of very intersting charts where you see like you know the error decreased decrease decrease and then you sort of hit this inflection point an error goes through the roof on your test set even as it continues to get better on your training because you know you're you're sort of modeling every nook and cranny of your training data at the cost of not being able to generalize well and in neural networks overfitting occurs when you have very high weights which is part of the reason why we initialize to small little random weights and it also occurs if you have lots and lots and lots of hidden nodes and so again tuning that number of hidden nodes perimeter will tell you if you're overfitting or not so I think this is pretty good ninety-three I'm happy with that and feel free if you pull down the the github repo and you play around with it please do make PRS with different Piper parameter tunings and things like that if you can if you can beat my high score of 93 point two eight cool so now that we have talked a bit about how neural networks work and how they can be implemented for something like the amnesty Atta said we're going to talk about how we built and trained this neural network library in Ruby and then built a small Network for the eminence data and then sort of look at the app that I put together to allow us to test it so as is always the case I think with the internet you start out to do something and then you find out that somebody else has already done it way better than you so I'm hugely indebted to Jeff using and I hope I'm pronouncing his neighborhood when I was about two-thirds or three-quarters of the way through the code for this talk I found his implementation for Ruby on the Emnes data set on github and it's highly encourage you to check it out it's it's very very cool we have very similar approaches but I think you know his his handles like touch events and stuff like that so when we get to the demo it's unfortunately not going to work on your phone but if you search JEP using you'll find his repo and his his does work on your phone so I will also accept all requests from people who set up touch events so that would be cool anyway so the front-end you know my major contribution here is disastrously over engineering things so I decided to do the front end which really probably only needs like 30 lines of like JavaScript and jQuery with react which I think has been proved to be an infinite number of lines of JavaScript but why not write like playing with toys is fun and playing with tools is fun and it's no secret that I don't love JavaScript but I did find you know es6 and web pack and react to be very nice tools so I think that you know things are things are getting better if you don't remember anything else from this talk keep that in mind things are getting better so this is just an example I hope you can read this as just the submit code from the react component so just uses the fetch API to send the canvas data over to the Sinatra server it does some processing and sends back some JSON saying hey here's what I think that number is and if you haven't used react I you know I like I said I enjoy it this is an overkill example but the idea here is you can build these sort of nice neat little you is where you have an editable canvas prediction and then a couple buttons and we'll see that when we take a look at the UI in a little bit so that's the front end the back end is Sinatra and Ruby - too soon to be - for it would have been two three but I got lazy and the idea here is to use the Ruby fan gem to do the training for us so you can sort of see all the theory we've talked about put into practice so we have some training data that we pull out we go ahead and say okay we're gonna create a new instance of the Ruby fan the artificial neural network there's gonna be 576 inputs which corresponds to a 24 by 24 pixel image so that's one input for every single pixel in the image I picked three hundred hidden neurons for this like I said this is something you can tune to see if you get better or worse performance and the number of outputs is 10 and as we talked about that corresponds to the fact that there are ten possible labels that come out and then we just go head and rein on the data the values here so train is our training data a thousand is one thousand max a pox which means if you go through a new train and back propagate and train and back propagate a thousand times and you still haven't hit the error threshold you're not going to do any better you can just stop ten tells us it's just for the console output and for Ruby fan it says you know every 10 a fox let me know how we're doing like how is the error and and sort of like are we sort of still progressing towards that that elusive global minimum for error and the point zero one just says that if we get it the desired mean so that's the mean squared error if we get to point zero one we can stop and mean squared errors I feel one of those things it's like plagues computer science and math it's like once you know what it means the name makes perfect sense but if you don't know what it means you're like okay and so mean squared error is just the average of the squares of everything we were off by the average sort of makes sense like you want to see what the average error is we go ahead and square it one to make sure that we have positive attributes or positive values all the time so you don't have like positive or negative error like cancelling each other out and we square it partly because it magnifies like outliers so we pay more to them but also it has some nice algebraic qualities that I do not understand and that's why we squared things instead of taking the absolute value but I'm sure somebody on like stat Stack Exchange can can tell you a lot more than I can cool so the front end react es6 stuff like that back end is ravine Sinatra using the Ruby fan gem I feel like it's enough of me talking for now and I'm going to go ahead and do something dangerous which is a demo I'm not going to do any live code yeah exactly I'm not going to do anything super crazy but we'll see how well this works all right there we go cool okay so I'm running this locally I actually turned the Wi-Fi off so I and I practice this talk without it because I knew that if I relied on the Wi-Fi is something terrible would happen so as you can see there's nothing here and the prediction is to no it's I I did it earlier and then cleared it so let's go ahead and say it's draw seven I think that's pretty good thinking oh it's three all right that's great and there's there some words where it comes out I super don't understand why like I said these were sort of black boxes that's a three that's great this is one what's going to be sometimes I think someone threw 0s for some reason there we go and let's do another one any recommendations from the audience a smiley face let's do it and see what a smiley face is that's who I can draw somehow there's there we go zero yeah and it only does one digit at a time so if you do something fancy like this it's not very good anyway it's a five so on average 18 is a five which actually is not so wrong if you were to add one in eight and then divide by two I don't think it's doing anything that fancy and I'll do one more like a loopy loopy to it it's usually pretty good at loopy twos yeah a for closed up for all right see if I can do this it also relies on my drawing ability which is not great with a mouse okay for eight oh seven with a bar there we go huh I'm always surprisingly go right I'll try like a European one here we go ah I'm pleasantly surprised by this a rounded and I are I started doing an eight sold rez okay oh yeah so let's do the rounded nine it's not a very good night it's not gonna know what that is no wow and let's do one with a bar huh yeah there you go as your wood slasher good let's do it why not zero with slash where's my mouse here we go I would expect like an eighth or something or like it Oh square zero oh no I got that's alright I can fix it square zero that is better than I would have done if you had shown me this I would not have a better I would not have known this was it zero alright last chance any any more numbers Oh two circle eight two circle eight that's the reason this talk actually takes 35 minutes is his part I budgeted like 30 minutes just for playing around and we had an ampersand which that's kind of like an ampersand right that's the curse of s so let's do a cursive s is I'm have to look at my keyboard oh yeah all right this is an ampersand six but you can see why I would think that it was sort of like six shaped almost still six all right last one stick figure man I don't know what last one means all right let's do stick figure man then I'm gonna be done there we go eight yeah kind of looks like a name okay all right back to the show cool all right so we've reached that point where we're gonna start summarizing so what did we learn about right we looked at machine learning and we saw that it was generally pun intended generalization that supervised learning is effectively taking labeled data and figuring out unlabeled data from what we know about the label data and neural networks are awesome neural networks are super cool they do have some pitfalls like I said they're a bit of a black box they can over fit but if you tune them and play around them enough they can get really good results and you can do all this with Ruby which is awesome like you know I've done a fair amount of machine learning stuff for the past few months in Python and Java and closure and it's been super super nice to be able to do it in Ruby so I'm going to tell you a scary story and I'm gonna finish with an inspirational one I hope the scary story is I was giving a talk similar to this one at gyro closure on a different data set and I was building decision trees based on Los Angeles police data from the Year 2015 and the idea was if you know somebody's sex race stop type which is either pedestrian or vehicular you can predict the incidence of what they call post stop activity and the police literature doesn't have a lot on this publicly but post-op activity means an arrest or a search and it turns out that with about eighty percent accuracy if you know just somebody's sex and race from the LAPD data you can determine whether or not there's post-op activity which is sort of horrifying on its own but you'll see stuff in the news now where people say oh we can solve this problem with machine learning you know we can say we can figure out what this number is we can figure out is this is a car we can figure out who to arrest because the Machine will tell us and the thing is if you have biased data or even racist data you will end up with a racist machine and so we have to be extremely careful about what we believe about the data and what data we put in because it's going to affect what comes out and so I just caution you as you're doing machine learning stuff on large things like large data sets like medical data or police data or have you things that are a little bit more emotionally salient and important frankly than classifying numbers think carefully about that think about what it means for the for the machine to say yeah you should definitely arrest this person because of their color or because of their sex so we have to be careful so that was the scary story this is the inspirational one this is the the TL DPA or the too long didn't pay attention so we're almost there we're very close to the end we can do machine learning with Ruby but the tools in Ruby are not as good as the tools in Python and Java Ruby is a phenomenal community it's full of smart motivated really supportive people and so if we want to do machine learning in Ruby we have to build these tools we have to contribute we have to maintain we have to break new ground and we have to be willing to go out there and build the stuff that we want to sort of be the change we want to see in the code too horribly steal from Gandhi I think but we can do it this is something that we can do if we are diligent and if we really want to kendo Bharata I think gave a really good talk at Ruby cocky about the state of cyber B and the need for contributors and the need to sort of grow and whether or not we have bindings for things like scikit-learn whether or not we have bindings for things like you know Weka or other Java tools you know deal for J things like that whether we're doing stuff in JRuby or MRI or Rubinius we need to build these tools if we want to have them so please do think about contributing to tools like Ruby fan to projects like cyber be feel free to check out the public version of the ribbon estab which is ruby M&S taroko app comm like I said it will not work on your phone because I'm lazy but for requests are welcome and that's just github calm my name Araki Weinstein and Ruby - and and I am super confident that if we work together really hard on this we can do great stuff like machine learning using a language that we all love so that's all I got so again thanks so much for coming I really appreciate you all coming here and I think we have 5 to 10 minutes for questions if anybody has questions no sorry so the question is are there are there well-maintained recent ruby gems things that we know are sort of under active development that we can rely on to do machine learning stuff and and at the moment there's nothing that I know of that's much better and more recent than something like Ruby Fenn so if we do want to build tools like this we've got it we've got to make them sure the question is how do you narrow Linette scale when you throw more resources at them do you mean in terms of just training faster or short right does it get better if I'm going around sure so um you will tend to overfit with more neurons per layer and I think more neurons generally but there are some interesting architectures coming out from places like Google that are very deep neural networks with lots of neurons and they have found ways to sort of mitigate overfitting one thing you can do is regularization you can kind of prevent your weights from getting too big present the problem with deep networks there there - one is that your weights can get very very large which occur due to overfitting the other is that you can have what's called an exploding or a vanishing gradient the idea being when you go through all these deep layers that little that error signal from the very end that's supposed to back propagate and kind of adjust your weights becomes extremely weak and so there are there are tools for mitigating that but large networks can be prone to those problems so you know the answer is not necessarily in terms of scaling there's one thing I can think of which would be if you wanted to use like an ensemble machine learning method like boosting say the way boosting works is you have these little learning algorithms you can pick whatever you want and they have to be what are called weak learners which means just on average like if it's two possible labels more than half the time they're correct and the idea behind boosting is that you kind of train all these little weak learners on subsets of the data and then they work together to produce they effectively vote it's sort of an over generalization but that's how that works and so if you wanted you could train you could have a bunch of different machines running or all networks trading on subsets of the data and then running those in parallel you could then do something like boosting afterwards and boosted neural networks do tend to work pretty well they can take a long time but like I said if you're parallelizing the actual neural network training you might see some savings there the she was boosting there are two ways it over fits that I know of one is if the underlying model over fits they will tend to over fit so you have to be careful about not keeping those neural networks too big or having high weights the other issue is in the presence of what's called pink noise sort of uniform noise if there's uniform noise in the data these little weak learners the way they work is they pay more attention to problems they've gotten wrong and they will work extremely hard to pick up right answers to things that they've gotten wrong before so if you have uniform noise in your data it's gonna spend an unbelievable amount of time trying to get all these little pieces of noise correct and that will contribute to overfitting so I guess the TLDR there is they they don't scale well necessarily by adding more neurons but I can see a way that you could do it with more machines sure does a good question sure so the question is are there other publicly available datasets that I might look at besides the eminence data there are a bunch and they are super interesting so there are a lot of cities now like London LA New York that have open data projects so if you go to if you search Li open data you'll find datasets including like the LAPD open data the University of California at Irvine has a very large machine learning repository they have things like you know incidents of diabetes in certain populations or incidents of heart disease in certain populations or things like you know data about abalone like there are a lot of very interesting datasets that are medical legal things like that and like I said are sort of more emotionally culturally salient than just like what number is this oh yeah there's a whole bunch and I encourage you to search those out I think there actually might be a couple of github repos that are just lists of cool datasets sure so the question is if you can't find the data that you are you want is there a way to generate it so the the short answer is yes the long answer is that really depends on what the data are and what you need you know I know that there are some initiatives now that are kind of going towards places that are historically sort of closed off you know things that are very difficult or medical data so I've been doing some machine learning work on the heart disease data set from UCI and the issue is that there's just a lot of dimensions the the data have like 13 or 14 attributes everything from like resting blood pressure to cholesterol to you know and only about 300 instances and if you have this is idea in machine learning called the curse of dimensionality the idea being the more features you have that you want to train on your amount of data that you need to effectively train goes up exponentially and you can kind of think of it like you know having a point versus a line versus square versus a cube like you're getting this like huge increase in the amount of data since you would need to sort of like explain those features and so there I know that there are some initiatives to work with places like hospitals to anonymize and release large amounts of data but right now in terms of generating it I think unless you're going to sort of spearhead your own study I think while it can be done it's sort of a time and resource intensive thing sure so the question is do I know the history of why Ruby fan is a thing as opposed to you know writing a binding for tensorflow I I don't know I think the last time I checked tensorflow people were working on Ruby bindings for it but I think so Ruby fan is a few years old I think it's older than then tensorflow or at least older than tensorflow being very popular so I think part of the reason is that's what was there when people started doing it the other thing is I do think people tend to look at these problem domains and say well you know I really like Ruby but there's all this tooling in you know for like they oh no or tensorflow or scikit-learn and it's all Python and a lot of people you know don't have a strong opinion between Ruby and Python and say yeah I'll do it in Python it'll be fine you know or the same thing is you know people will do stuff with Java if they have like deal 4j or something like that and they view it is maybe a hassle or unnecessary to say I'll do it with JRuby instead so I think there's just kind of inertia there and I think we really need concerted effort to either build bridges to Python and Java in Ruby or to build new stuff in Ruby if that's the language we want to use to do this kind of work sure so the question is how do we use machine learning at Hulu or what are some common use cases for machine learning so my team does not do machine learning directly our machine learning team is in Beijing but I know that a lot of our recommendation stuff is machine learning driven so you can use it for things like recommendations you can use it for things like I think image recognition is a common one so if you've ever seen any of these kind of like Google deep learning papers that come out where you know there's a machine that can identify you know a burden a picture with like 95 person accuracy that's an application of machine learning generally what they're using are what are called a convolution neural networks and comp nets basically I sort of alluded to this earlier it's like rather than kind of like down sampling and centering your image and saying just kind of looking at it as a whole you sort of have these little sets of filters that kind of like scroll over they convolve over the image and you can kind of think of it like almost like I guess like taking like a little piece of paper with like a shape cut out just kind of like going over like a another like a newspapers and like that and when those things line up the signal is very hot you can kind of imagine this like wave function that like when things line up with that filter you get like a bright patch of brightness and so you'll have like a little filter that picks up diagonal lines or a little filter in the picks up horizontal lines or vertical lines do they get aggregated in Twitter called feature Maps and then the future maps are sort of like okay this is like a 4d representation of this image and then there's something called down sampling and max pooling where you basically pick like all the brightest spots of the bright things use those to represent the data and kind of like throw some information away to keep the problem computationally tractable and you keep doing this over and over and over and over until you and you can kind of see it's very very cool like shapes emerge out of out of that and then once you have them you can train on those data so yeah image recognition recommendations things like that are general applications cool well like I said thanks so much I really appreciate it and if you have any questions or want to come find me afterwards please