Generative Testing: Properties, State and Beyond

0 0

good afternoon ladies and gentlemen I'm very happy to be here today in Barcelona I'm here to talk about testing but before I start allow me to introduce myself briefly my name is Yann stem king I'm originally from Warsaw a city in Poland a beautiful place I recommend a visit it's lovely this time of the year I I've been using closure since 2010 and I work at stylist in Munich mostly on the back hand side of systems let's talk about testing the testing culture is quite strong in the closure community when you generate your new project typically we've line again you get one test for free as part of the project skeleton and to make matters worse the test fails you have to fix it this is good but why do we test things automatically we have the repple right we could just take all of our test expressions and put them directly into the repple and validate whether everything works as expected or it doesn't we don't do this because just like sequences in our favorite language we are lazy and it's a good thing it's a very good thing we take things which are easy and can be alternated and lets computers do it for us and keeping this automation and laziness in mind let's dig into the world of properties the core concept of property based testing let's start with a very simple example division let's assume we have an arithmetic library we want to test and we start with the division operator we could write the following closure closure tests test case which checks whether the division of integers works as expected with their division of a double and an integer works as expected and finally whether arithmetic exception is thrown when it should be thrown my question is is this enough its fist enough to be sure that our library is properly implemented if not how many more tests we need to do we need ten more a hundred more a thousand more it's difficult to judge instead let's see how we can refactor this test a bit let's notice that if we forget about the arithmetic exception case we can use the our macro and take the extract the test expression to one line and then just keep specifying more and more input values until we are sure that our implementation is right now let's do some mathematics a over B equals a over B let's multiply both sides of this equation by D and simplify the right side if we express this equation and plain English we end up with something which looks as follows this property should hold for division general property keyword let's take this property and put it into our test what we have here is something a bit different notice that this time our test cases do not have to have a result specified we just specify input values and we don't care about the output value we just care that this property holds for every single input value wouldn't it make sense sense to have those things generated for us for free so that for arbitrary s and arbitrary B's property holds it turns out that this is exactly what the quick check is all about originally introduced in a paper from 1999 and quick check for Haskell was to look quickly ported to other languages and it's available among other things for closure let's see how we could use test check in order to improve our tests of the arithmetic library let's require tests check the close report of the one of the close reports of the quick check method and require a couple of nine spaces we will need the core test check namespace and the name is responsible for generators and properties using what we just what we've just required we can now define so-called property prop is a customer customary name - to name those symbols so for all a's big ends and for all B's being ends we claim that the following property holds notice that if you squint it's nothing but plain mathematics now using the test case quick check function we can generate 1000 random values piped into our property and and verify is that everything works as expected if we evaluate this expression we will get the following map back well this doesn't look that good let's take a look our is mattock exception divided by 0 well indeed we never said that we cannot be 0 so we need to fix our property somehow we need to make B's exclude well we need to exclude 0 from possible values of B in order to do this we will use such that Combinator which believes just like filter function if you like given a predicate it generates a new generator of values which match the predicate using gel sample we can take a look at various values generated by various generators here we can see that generator of integers generates ends just as before and our new derived generator excludes all the zeros just as we wanted now we can plug in our newly defined generator and take a look at a result value of the new property we execute our test again and this time after 1000 executions test check did not find any input values which would contradict what we claimed in the property but as you remember in our original tests we use the doubles as well let's extend our test so that it covers doubles as well less the version of test check I so didn't have a built-in generator of doubles so let's write one on our own ethnic is something like math but for generators here we are mapping the double function over generator of integers and as the sample function will show us we get integers mapped into doubles once again we have to exclude zero so we use our old friend such as to exclude all the zeros and this time we have a new generator with doubles excluding zeros we plug both generate generators into our test case we run 1000 tests and the result is false what's happening there well if you take a look at the fail key you will see value 23 and 21 this is a and B for which the test fails well this is puzzling let's take a look at what's happening inside well indeed if we plug those values directly into the test it's false so what's the actual value well it turns out that what we run into is the fact that on our computers we used on a daily basis operations and floating-point numbers are not exact and our claim about equality of those divisions and multiplications was not right and this is something generative testing excels at it comes up with all the crazy possible combinations of input values and subjects your properties to to a hail of tests and then it gives you a ready input value to consider for debugging a wonderful tool now I'd like to show you something different let's talk about reverse if we want to test the reverse function once again we have to think about some properties holds something general well for certain sequences like lists or vectors reversing a reverse collection is the same collection it would be excellent if we could just test it this way but there are reasons for which this does not work instead we can write it this way that's for arbitrary collection reverse of reverse our collection is equal to the original collection this time I'm going to write a proper closure test namespace to test this behavior let's write let's create the project core test namespace and let's assume the following partially correct implementation of reverse we will need the DEF spec macro which corresponds to a closure test def test macro using it we will write what follows property of reverse composed for all collections being lists of integers I claim that collection is equal to reverse of reverse of collection let's run it we can run it using standard core standards closure test tools the map says that the property doesn't hold and the failing case under the fail key is 0 a list consisting of just 0 well it makes sense notice also the CD key this is the random seed which is used to will see the random generator which subsequently generates values for our test cases on new execution and you see it will be generated and it will be and our test will be subjected to a different sequence of test cases let's try this now this time test check sounded different test case different failing scenario it's a list to one and now let's take a look at this part of the part of the map which I've been ignoring so far let's take a look at the Shrunk sub map shrunk contains the result of so-called shrinking procedure shrinking is in minimizing the failing test value how it works is that after after quick check and specifically test check after finding a value for which the test does not hold it uses certain properties of generators to find to make those values smaller and smaller and minimize them and find the minimal value for which the test fails and then it gives you this minimal fail in case now I can't find words to describe how awesome this is instead of let's let's assume you've got a domain where you represent your you represent your domain using some nested Maps potential you recursively nested Maps test check will not throw at you five megabytes of a data structure saying debug that instead it will do its best to minimize it to a minimal map making your test fail again automation this is exactly what the human would do right we would try to figure out what is actually the minimal scenario for which our logic doesn't correspond to the specification it's done automatically for us it's outstanding it seems that so far test checked is an excellent tool as long as your domain is expressible using lists and integers arguably most domains can be expressed this way but let's take a look at something more applied let's take a look at Stanford's at Cypress we are maintaining multiple of several services in closure one of them is our main website this system has quite a complicated routing logic this logic has been a source of many bugs regressions internal conflicts and other problems we wanted to rewrite it and we wanted to get it done right at this time we also wanted to have comprehensive test suit which will make it made sure that this time no bug is left behind property based testing was a natural candidate for this problem we decided to hide the entire implementation of our new routing and routes while routing and router generation behind a simple API exposing just two functions path to descriptor where descriptor is a data structure representing a given page on our website and an corresponding function for generating strings how would we attack this problem how would we generate will be tested well reverse once again this shows a very similar property to the reverse case assuming that we have the same context composing those two functions yields a and identity function in the domain of pass or descriptors so let's test it what we wrote looked as follows for all descriptors generated by our custom generator of valid e descriptors and for all contexts following equality which I just described should always hold the number of bugs the number of regressions this single test has uncovered in our implementation is unbelievable and it's also a wonderful tool for test-driven development if you're into writing tests seeing them read implementing something saying them green you'll love this what you do is you write a generator you write a property and it just will keep giving you more and more failing scenarios until you will get your implementation rights according to your claims and when you're done you just extend your generators to a broader the name of your problem to broader subsets of your domain or write more properties it works like a charm it works for us like a charm let's take a look at some other use cases in his dock John Hughes one of course herself quickcheck discusses a lot of interesting topics among them a problem they found in I believe well some car manufacturer where the internal bus in which a message in in which messages are being sent around between various subsystems of the car they found quite a life threatening bug thanks to generative testing a very interesting talk I really recommend checking this out a word of warning if you watch this talk you won't feel safe in your car anymore another use case Kyle Pillsbury and his work on distributed databases Jepson if you're not familiar with Kyle's work and what he does is he subjects distributed databases to automatically generated sequence of operations executed concurrently and then introduces glitches in the network connecting those databases and then figures out whether and checks whether the database holds promises it should guarantee in such scenarios a really interesting talk and really interesting and very real-world very applied application of generative testing a word of warning if you watch this talk you won't feel safe using your database ever again fine we see how certain people use generative testing for some really complex and interesting problems but how do they do this well in order to understand how do they do is we have to dig into the world of things which are other taboo and the closure let's talk about state in order to taste stateful systems generative testing works on a bit different basis originally introduced in the top from 2000 from in a paper from 2006 stateful generative testing was was used to test large telco systems at Ericsson Ericsson was I believe the first customer of the startup founded by creators of quick check very interesting paper in which they go in detail into how they have their tool works and how it is how it can be applied to really complex the name instead I will just instead of discussing it India I will present a simple example the general idea behind testing stateful systems with generative testing works as follows you generate a sequence of actions which you apply to the system under test thus mutating its State at the same time you keep track of all the all the changes in your internal model of the system and after each application you verify that the state of this system is just as the model just as the model suggests it should be let's take a look at a concrete example once again something simple in our case our mutating state will be simply a variable pointing to a closure vector and what we want to determine is whether applying cange and operations will whether conjoined adds to the end and pop removes from doing from the beginning in order to test the system we will maintain our our vector and a variable pointing to it and at the same time we will keep track of elements in the vector in a separate collection and after each application of either conjoin or pop we will check elements in the vector are identical to elements in our module of the vector let's generate a couple let's require a couple of things faces let's use that spec which we've seen before and generators and a new library which I'm introducing on this slide called States which does all the hard work for us related to generating sequence of operations let's define a following test case property of vector operations is defined using the states around commands function commands next step and postconditions that's a lot of arguments I will discuss them in detail also there is the map which stores the initial version of this of the initial state of the system which is an empty vector commands is a generator of actions of operations which will be applied for system it looks quite complex it takes as an argument as its argument it takes the state of the system under test instead of going into detail of its implementation let's take a look at this sample of values it would return as you can see given a state in which a vector is symbol a and a list of elements we keep track of this 1 2 3 it generates a following sequence of simple closure s expressions which would be applied to the vector next function to discuss is next step as its argument it takes the state of the system under test a variable to which the last reason to the result of last operation was assigned and the last operation which was executed we will dispatch on the operation which which has been just executed if it's conjoint we will add an element which was added to the list to our internal model and in case of pop we will do the reverse drop one element in both cases we will keep track of the variable to which we will keep track of the variable pointing to our vector under test finally post condition is the place where our assertion is it takes as its arguments the state of the system the command which is executed and value returned by this command in this case we care only about the pop case if the operation was pop we expect that the returned value is identical to our internal model of the system if it's not pop we assume everything is alright let's run our test and see what's happening it fails post condition unsatisfied this the system was in the following states elements - 2 7 - 1 and vector being variable - in the fail vector you see a sequence of operations which led to the post condition being unsatisfied if the collection of operations was longer thanks to shrinking it would be minimized to a smaller hopefully minimal sequence of operations which reproduces the incorrect behavior what's wrong well it turns out that I made a mistake in those condition in fact I made a mistake elsewhere I keep track of elements of the vector in a list and icon joint elements to the front of the list instead of in adding them to the end what fixes the problem is reversing elements in the Equality check with such a small change I can rerun my tests and this time after 100 of random executions of various sequences of commands against my state everything is alright if you'd like to check out some more complex scenario of this method of testing I recommend taking a look at the project which was mentioned already today namely little matrix series this time however I'm not interested in the implementation of the library but its tests in the test namespace Macau uses generative testing to generate sequences of operations which he executed against his implementation of the map and another standard library implementation and finally he compares whether both maps are identical this way maybe not guaranteeing but increasing the likeliness that his implementation is correct as well on top of this strategy of testing stateful systems and generating operations to execute against your system and the model of the system you can build even more complex things in which I've mentioned once already today John Hughes introduces goes into detail of generating concurrent sequences of operations and applying them to the state in multiple threads at the same time and afterwards checking whether the system under test still holds its promises in case of any potential race conditions it's happening within the system between those concurrent threads what you get in turn is nothing but automated detection of race conditions John Hughes company offers commercial support for a product and commercial support for testing of such systems testing such parallel models but in his talk which I'm which I encourage you to take a look at he discusses very interesting applications he starts with very simple cases some ticketing machines and then goes to applications such as details some really low-level threats who libraries running inside of specials write a triac database and tells stories of how they use such systems to find some really intricate bugs deep insights of their dreadful implementation a fascinating video another interesting talk which I can recommend is it comes from comes from last year's Europe closure in Krakov Philippe Potter introduces a new era of testing and presents a an interesting combination of of tools he combines tools for regenerative testing like test check with projects which Kyle Kingsbury developed for his Jebsen experiments and combine combines them to achieve similar tools like like ones offered by John Hughes company namely automatic detectors of race conditions in your closure applications I really recommend very interesting talk very interesting solution to a difficult problem if you'd like to learn more there is a lot of reading material no need to take pictures slides will be in the Internet's momentarily first two links are to closure libraries which implement this idea of stateful testing using generative testing to following links our airline implementations also very interesting very illustrative attend finally in the blog post section we have an experience report covering this this story which I've already mentioned about using quick check to find race conditions in thread pool implementations then yellow wrap blog there you'll find a very interesting list of tips tricks links to interesting materials links to interesting libraries all dedicated to being more effective at using your eyes using generative testing for from checking your code and then if you have long commutes and are into podcasts I recommend downloading the quick check IDs out of mostly airline podcast which is an interview with John Hughes where he tells interesting stories about how his work on a quick check has has begun including the fact that the initial version of the quick check paper got rejected and after being finally published after 10 years got a price for the most influential paper of the past decade very interesting podcast and a couple of interesting stories and one hour well spent I hope that in this short talk I was able to convince you that generative testing is a viable alternative or a viable complement to traditional testing methods instead of specifying unit tests one by one you just defined a more general broader properties and use tooling which automatically will verify your proper is by subjecting it to a hail of test cases if something is wrong you will get back a minimal example showing you where your system doesn't match your specification on top of those simple foundations you can build even more powerful tools you can build things which says stateful systems real-world systems and finally making them concurrent you can ultimate you can use the same tool to automatically detect race conditions and really complex software projects just like Kyle Kingsbury is doing and his Jepson work what lies beyond that this is a question I've got to you this is all I've got thank you all very much