The Top Things I’ve Learned Since The Phoenix Project Came Out

0 0

ah Thank You Dominica so I've had the privilege of studying high-performing technology organization since 1999 and though specifically those are the organizations that had the best project due date performance in development they had the best operational availability in stability in ops as well as the best posture security compliance and so our goal was always to understand how did these amazing organizations make their good to great transformation so why would we do that is because we want to understand how other organizations could replicate their amazing outcomes so you can imagine in that 18 year journey there were many surprises but by far the biggest surprise was how it took me into the the DevOps moon which i think is urgent and important the last time that we've seen any industry being disrupted to the extent that our industry's paint disruptors probably in the 1980s when it was revolutionized through the application of lean principles and I think that's exactly what DevOps is is the technology value stream when you apply those same lean principles so in the next 30 minutes what I want to do is share with you what I've learned since the Venus Project came out in 2013 and it is impossible for me to overstate how much I've learned so what I want to do is share with you what those biggest learners are and somebody's like these are the things I wish I had learned before we actually put out the Phoenix project in 2013 the first one is to what extent the business value that DevOps creates by applying DevOps principles and patterns and so this is work that was that we did with dr. Nicole four-screen who's here somewhere in the room just humble a DevOps handbook co-author as well as a Alana brown from puppet and so what we found is that the high performers are massively outperforming the non high-performing peers and this is based on for now going on four years of research spanning 26,000 respondents or will be by the end ear so uh surprise number one is how high performers are just getting a lot more done they're doing 200 times more frequent deployments and so that could be deployments of code or that could be if deployments or changes in the environment and more importantly they can complete those deployments 2,500 times more quickly notice how quickly can we go from a change being introduced into version control right code as this we're not just for development that's for everybody right version controls for everybody do sort of test process through some sort of deployment process so it's actually running production so customers actually getting value right a high performance can do it in usually minutes worst case hours whereas lower performers what might require weeks months or quarters so it's not just that they're getting more done now they're getting far better outcomes when they do a production deployment high performers are one third is likely to have that change blow up and causes that one outage a service impairment a security breach or compliance failure and when something goes wrong they can fix those issues 24 times more quickly in other words a mean time to store was 24 times faster so this was such a decisive finding what first came out in 2014 because it showed us that you know it just deeply held in common experience that we had in general the larger the size of our deployments the more things that go wrong and the bigger creator that we make in the data center right the only way that we can get these sort of reliability profiles is to be doing smaller deployments more frequently so as has been validated year-over-year last year we found another dimension of quality which is that because high performers are integrating information security objectives into every stage of everybody's daily work they're spending one half the amount of time remediating security issues and because they're able to control unplanned work they're able to deploy nearly 1/3 more time on planned new work so it's a higher strategic activities versus the lower value firefighting which is probably the first page of the Phoenix project right this is what the firefighting feels like in 2014 now we found that not only did high performance at better IT performance as measured by deployment frequency lead time mean time to pair change success rates they had better organizational performance it was found that high performers are twice as likely to exceed market share profitability and productivity goals and for those nearly 1,000 organizations that year that gave us a stock ticker symbol they had 50% higher market cap growth over three years so this last year we found another marker of organizational performance in high performers employees are 2.2 times more likely to recommend their organizations as a great place to work to their friends as measured by the employee Net Promoter Score and so this is just a great proxy measure for our organizations ability to hire and retain great talent right so I think all of you gives us a better ability to sell devops within our organizations and Nicole and I will be working on the 2017 state of DevOps report I can't tell you a lot about it but it's freaking awesome just when you think you've learned at all heydo there's more to learn so surprise number one was just a just to what extent high performers were outperforming the non high performing peers surprise number two is how DevOps is as good for operations as it is for development one of the cases they got a chance to study with jazz humble four weeks was this case study back in 2008 and it's a facebook chat launch story and some of you may roll your eyes saying wow what could be so interesting about that right you know chat servers are what undergraduate CS students write as part of their sophomore year of college right and so although that's true what you may not know is that no chat is inherently an order n-cubed algorithm and so at Facebook you know n is 70 million simultaneous users so that was actually technically I was considered one of the metals massive technical undertaking that Facebook it took them one year to do it was one of the largest project teams ever assembled and so there were two technical practices that blew me away one was the notion that they were testing interaction for well how do they use that year so as soon as they constituted the chat team you know even on day one they're checking the code you know into a shared source code repository and anything that was in the source code repo and trunk would be migrated to the production environment at least once per day the second cut and they would do it in the middle of the day 2 p.m. Pacific time right not at midnight not on Friday right not working a weekend they would do in the middle of day and then the second thing was that they were using every active Facebook browser user session as a test harness right yeah so and the reason they did this was so they could simulate production like loads even at the earliest stages of the project and so you know the result was when they dark launched this one year later they went from zero users to 70 million users overnight right so that's a dark launch you know the launch is simply changing figuration flag right and if something went wrong they could just undo that configuration but for me the the more interesting practice is that notion of a daily deployment right that they don't do it at Friday at midnight and make people work under horrendous conditions all weekend long to get things running before customers notice on Monday morning and I think the best verbalization of this came from Nathan Simic he told me in 2013 it was actually at Jay's humble conference float con at the bar he said as a lifelong ops practitioner I know that we need to have up to make our work humane he said over the course of my career I've worked on every holiday on my birthday even worse on my spouses birthday and even on the day my son was born so I think some of you may have friends you know have been a situation right we're out of a sense of duty or obligation or maybe they because they didn't have a choice right they've had to do this and some of you may be like me where you've been a part of the leadership they've created these inhumane work systems and I think what makes DevOps have so much significance is that we now know that there's a better way right it doesn't have to be this way unless do you think that this is only possible with open-source hippie companies like Facebook right you should know about this case study from Scott Peru from CSG there's a largest bill printing company in the United States they're publicly traded and if you get a paper bill from a Comcast charter communications DirecTV chance that comes from one the two bill printing plants in the u.s. from CSG and this thing is in my mind one of the pathological worst case architectures what you can do DevOps on now this bill printing application runs on 20 different technology stacks including dotnet thick clients and clients j2ee COBOL assembler cobalt of a mainframe COBOL mainframe assembler mainframes eb sam mainframe tv2 right it's all in there right and so to execute a deployment it required 20 simultaneous deployments on 20 different technology stacks for it to work it would take them 14 days to execute all right so so they over the course of the year they moved to a DevOps transformation and they went from two releases a year to four release of the year but success was predicated on the notion of a daily deployment every day a team spanning devtest operations would deploy into a UAT environment what were the outcomes within a year incident count went down by 90% main character pair went down by 98 percent but most importantly the code deployment lead time right in other words how went from 14 days down to a day right so 14 days of a release team trapped in a war room right trying to get things running right with executives coming in every hour saying are we done yet right to which they would have to Alice respond no we're not done yet we had 13 more days to go right 14 days within a year it was done by 1:00 p.m. on day one right and the Xbox would come out right because our no life sign incident so it's great for dev testing operations it also is great for the orbit the business and customers because often they can't get the value of the features in half the time so so yet there's another side one of the more interesting patterns for me and watching dev up this notion of developers being put on page rotation hatchets Lightbody said in 2011 he said what we found was when we woke up developers at 2:00 a.m. defects got fixed faster than ever think of and Verner Vogel's said it even worse essentially if you helped build it you must help run it and so I'm very well aware that jackasses like me showing off jackass slide like this it's probably mobilizing entire generation of developers to hate DevOps right they will sabotage every DevOps effort they see because they would say we did not become developers to wear a pager pagers are for ops people right the whole reason they became ops people because like pagers and so although I will recognize that there was an internal consistency to that logic I think there was a more compelling narrative and that comes from Tim Tishler for many years he led the devil initiative at Nike and he said as a developer myself the most satisfying point of my career was when I got to write the code when I got to test it myself when I got to push into production myself when I got to see happy customers or it worked and when I got to see they're angry shaking fists when it didn't work and when I could fix it myself he said not only you know the point is you know if I didn't have to open up ticket and wait a day the point is that not that I could have fixed the Sasser which is true the point is I could have learned something so I didn't make the same mistake the next time around and what he said was that there are inability to self test self deploy self fix has diminished over the last decade and paradoxically is actually being put on page rotation that allows us to get that joy back how am i doing you this am I being too cavalier about that claim kind of cool right great for ops and it's great for dev and I think it's because ultimately we're both engineers right and increasingly as ops professionals I think a we're going to be using development philosophies development practices increasingly the same tools as developers and ultimately it's because we're both engineers working in the value stream together so surprise number one was a business value of DevOps the second surprise was just how great devid DevOps is for both development and operations the third surprise was there's a measurement that looks very tactical and easy to do miss dismiss but I actually think it's probably most strategic measurements of any technology organization and specifically its code deployment lead time so even though the DevOps community I think the one metric that we love talking about is deploys per day right it's kind of embedded into the DevOps community right Flickr we did ten deploys a day every day you know as per the famous Altima Hammond slide in 2009 but in the lean community especially manufacturing that is obviously not the favorite metrics their favorite metric is lead time and you know there's this deeply held belief that goes back almost 60 years that says lead time is the most accurate predictor of internal quality external customer satisfaction and even employee happiness and what we found in our benchmarking work spent 26,000 respondents is that that property applies to our work as well so in manufacturing that would probably measure lead time as how quickly can we go from a customer order or raw materials at one of the plans to finish goods leaving the plant in our world in our research we specifically measured lead time as you know the point at which changes are introduced into version control right through tests through integration through tests through deployment so that customers are actually getting value and I think that begs the question why do we start the lead time clock they're right why don't we start it earlier when a feature is accepted by a development or even when an idea is first conceived and the big learning for me was that it's the point of which changes introduced into version control that is a dividing line between two very different parts of the technology value stream so to the left of being put into version control is design and development so the nature of design and development work like the facebook chat story right is often we're doing work for the first time maybe never again to be repeated right so the lead time does that kind of work is longer and it's more highly variable right because we never get a chance to practice it so that think that's a fact of life everything to the right of changes being committed into version control is we want the exact opposite characteristics of work so that's testing and operations we want testing and deployment to be happening all the time we want to happen mechanistically repeatedly the same way every time and we want to happen quickly so code deployment lead time simultaneously predicts the effectiveness of testing and operations but it also predicts how quickly can we create feedback for design and development in other words if I'm a developer and I make an error and I only find out about it nine months later during integration testing then the link between cause and effect has surely been lost right this is where it gets into the blame game now who change what not me and it must be that person whatever in the ideal you know D that error should be detected within minutes because that's when we kick in quick automated testing right and so that not only creates fast feedback developers but it also predicts how quickly can design development get feedback from external customers you know you can't do a lot of experiments if you're only deploying once a year right so code deployment deep time is a great predictor of testing operations effectiveness as well as how quickly we can create feedback to design and development so that was learning number three so spry umber one was business value of DevOps two was how great it is for Devon ops and three is this tactical measurement there's actually not so tactical at all but in my mind is prized the most strategic manager of any technology organization so prize number three was Conway's law and so I noticed like around 2012 it was very difficult to go to any DevOps event and not have someone talking about Conway's law and so for those of you who aren't familiar with it I think the most popular incarnation of it was actually framed by Eric Raymond so he wrote the great book procedure on the bazaar about open source but and also responsible for the devil's dictionary and his definition was this if you have four groups working on a compiler you will get a four pass compiler and this is based on famous experiment that dr. Melvin Conway did in 1968 but so I think I kind of intellectually got that but wasn't I wasn't I'll be honest I there's no way that could have actually explained back to you how that would impact how we design our work and how we execute that work in a DevOps value stream so one of the biggest aha moments was for me was just seeing common with law at work in the famous story about sprouter at Etsy so let me just tell you the story and show you how Conway's law actually is the backdrop of it so back in the bad old days at Etsy in 2008 in order to ship any piece of business functionality there are two teams that were required to have to do work right you have the devs working in the front-end and PHP and you have the DBA s making changes those stored procedures in Postgres right so these two teams would have to coordinate Marshall sequence prioritized blah blah blah blah all right then and so they said this is a problem right so the solution in 2009 was to create sprouter so spider stands for stored procedure router the idea was we're going to be able to enable the and DBAs to work independently and they'll meet in the middle inside sprouter the problem is is that now we went from two teams having to coordinate and sequence and marshal work to three teams having to coordinating from the marshal work and as I said this required a degree of synchronization and coordination that was rarely achieved every deployment became a mini outage it's just so as part of the great SD transformation they said all right what we have to go is being able to fully empowered developers to independently make changes just by making changes inside of PHP so they created this PHP ORM object relational model things so that they wouldn't have to make changes to the database directly and so the end result was not only did reliability go up right because but lead time went way down right because I think and this kind of shows how one of the goals of DevOps is to fully enable small teams to independently develop test and deploy value to customers right and so I think this is a great example at least it was for me of how commerce law can hurt us 2008 to 2009 guys we won from two teams of three teams and how Conway's law could help us as we went from you know three teams down to one so the real lesson for me was that the organization and our software architecture must be congruent right it's not enough to shuffle teams around you know we must also have an architecture that enables those teams to independently create and deploy value to customers right and we can't do that if all the teams have typed a couple together it means that every time we want to do something we have to coordinate with maybe hundreds of other developers testers and office people okay and by the way incidentally that becomes almost impossible when we have organized our teams by the technology if we have 20 different teams DBAs here Oh No Postgres there my sequel their Oracle they're you know j2ee their windows there right now you mean everything has to have a lot of handoffs so so surprised one business value of that ops 2 is DevOps - great for Devon ops 3 is code two-point lead time for is Conroy's law has a lot to do with the outcomes that we get so the prize number 5 is that I think when historians look back at the DevOps movement and technology in general I think they'll say DevOps is probably a subset of something much larger and that they would call it probably dynamic learning organizations so dr. Stephen spear he wrote was probably one of the most famous Harvard Business Review papers and it was a paper called decoding the DNA of the Toyota Production system and that was based on his PhD dissertation he did at the Harvard Business School and as part of that he actually worked on the assembly line on a production floor of a tier 1 supplier for six months at Toyota and before the Toyota aggressive let him do that they said you must first work in a big three auto plant for 30 days right essentially saying you know you will not understand the lessons that are going to be imparted on you until you work in a you know a more conventional plant so he extended that work beyond the Toyota Production system to helping build a safety culture at Alcoa to engine designer cratan Whitney to design and operations of the u.s. naval reactor core inside the US Navy and so forth and he said while designing perfectly safe systems is likely beyond our abilities and by the way there's no work those more dangerous and complex and the work that we do potentially safe systems are closely close to achievable when for following conditions are met so what I want to do is share with you what those four conditions are and I'll just highlight some of the technical practices that just might remind you of but I want to highlight on one of those capability because when I took this workshop at MIT it just hit me that there was a big blind spot that we had within the DevOps handbook authorship team in fact I would blame dr. Speier for about a two-year delay in the five years that it took to get the demo handbook out so the dr. Spier certs that there are four conditions that must exist one you must see problems as they occur right in other words any sort of assumptions that we have they're incorrect must be quickly revealed both in the design and operations phase of any complex work system and so yeah that's like assertion statements and code that's like telemetry production telemetry that's like all the telmex you don't want to put everywhere so we can actually see is the system behaving as we think it is so we can actually correct it the second capability is that when bad things happen we must swarm it not only to can restore service faster but so you can create new knowledge and so the the Paragon of this principle Toyota and I'm chord right when something goes wrong you pull the cord de tire assembly line stops and they do it 3,500 times a day right and it essentially what they're saying is that we need to make systemic fixes then and there because if we don't we're gonna have the same problem fifty-five seconds later right and so that's the notion of the daily workaround so daily workarounds happen in our world but because our work takes longer than 55 seconds it's just less visible right but it is just as destructive so in our world this would be like continuous testing continuous builds continuous deployment you know dropping whatever it takes when something goes wrong helping peer-review other people's code because getting them into production is actually more important than whatever I'm doing right now banned tomorrow might be other way around I might need someone else to peer-review my code right because lead time predicts effectiveness but the real big surprise for me was capably 3 there has to be some mechanism where local discoveries can be integrated to create global greatness right in other words how do we elevate the state of the practice so that you know genuine learnings are created and integrated everywhere so I want to share with you an alum number for leaders great new leaders but let me share with you like what capably 3 is all about for me this was the most profound like where what are ways that we propagate learnings in the DevOps community one is the notion of a single shared source code repo right and I think this is so important for operations and information security the whole idea is that we put our best expertise into code so that anyone who pulls from it can inherit the best known understanding and expertise of the entire organization right most famous example this is the model of source code repo at Google you know they have every engineer has access to all the Google properties and everything gets executed through a continuous deployment pipeline and build systems inside the repo only one version of each library allowed contrast after a friend of mine said I'm at a large Bank and he said of the 93 versions of Java struts we are running 92 of them in production right so consistency and conformity blameless post mortems this idea that when something goes wrong we create the condition so we can talk about problems as Bethenny Macri from XE said you know prevention requires honesty honesty requires safety right how do we make it possible to really create an accurate timeline of what actually happened so that we can actually talk about honestly you know what the right countermeasure should be soaking the idea to prevent those bad things from happening again if you can't prevent at least enable quicker detection and recovery chaos monkey right it's like eventually you run out of things to talk if you stop having seven outages you don't have you know blameless postmortem so you talk about not about just customer impacting incidents you talk about teen impacting incidents of the 7 safeguards that were designed to prevent a customer impacting incident six of them failed all right and if you run out of those eventual you have to create your own failures like chaos monkey right you know where we like Amazon as our Netflix randomly kills production compute instances all the time in production by the way did you know that they only run that during office hours in other words you don't actually want to wake people up needlessly at 2:00 a.m. you do it when everyone's in the office just like when you would do a deployment learning days you know DevOps days internal technology conferences another way where we can actually have people who are creating greatness be able to spread and propagate that and set the cultural norm that this is what we want you know within our workforce and certainly I think open-source is a part of that it's just one little slide on this one of my favorite quotes that I talked about think about all the time is this you're only as smart as the average of the top five people you hang out with and so I think it's organization like this where we can actually create the peer group you know where we can actually learn okay oh sorry sorry no I actually put these flies in for you but I don't have time all right surprise number six is I think there's a Smith's know that DevOps is just for the unicorns that's Google Amazon Facebook it's not for the horses large complex organizations that have been around for decades or even centuries and this has been my area of passion for the last four years which is studying not so much the unicorns but you know how developers building patterns are being used in organizations have been around for decades or maybe even centuries and so we're going to the fourth year of a conference that we call the DevOps Enterprise Summit and at the goal is really to cook these learning in fact there are 48 case studies in DevOps handbook 30 of them are from large complex organizations and they almost all came from this conference how every we asked leaders of technology organizations to tell experience reports here's my here's the industry we compete in here's my organization here's where I fit in the org chart here's the business problem we self solve here's what we did here's what we learned here's at the problem still remain and the reason for that is that as adult learners right we don't learn so much from theory and what people say we should do right we learn from what people did right so we can sort of conclude the learnings that we need and so by the way we did one in London last year and what was interesting about London was just the age of these organizations Barclays was founded in the year 16:34 right UK HMRC Her Majesty's Revenue collection service was found in the year 1200 right so I don't think there's any code that goes that far back but there are certainly traditions and values and practices that go that far back so you know there are many many awesome DevOps outcomes right there it should be no doubt that large complex organizations are achieving the same outcomes that the unicorns have been achieving but there's one thing that is astonishing to me which is the level of courageousness that being exhibited by these leaders I think every one of them was given some do air cover but I think almost every one of them at some point in their journey wildly exceeds the air cover they were given essentially putting themselves in some degree of personal jeopardy and so the question is why would they do that and I think the reason is is that every one of them had a sense of absolute clarity and conviction that what they were doing for their organizations was needed not just to survive in the marketplace but also win in the marketplace and as a little example like what courage looks like I got the shadow Heather McMahon for many years she was a senior development director at Target and I know this certificate on her desk and it looks like this print shop Pro types saying write a certificate and says - Heather McMahon for lifetime achievement award for annihilating tip and larb so after what is Ted and larb tap stands for the technology evaluation process and larb stands for the lead architecture review board so whenever you want to do something novel and scary like say use tomcat right you would fill out the tap form and eventually you would get the rights to be able to pitch the lead architecture review board and so you walk into a room and there's all the dev architects on one side all the ops architects on the other side they pepper you with questions they start arguing with each other and they assign you 50 more questions and say come back next week I'll come back next month right and she her reaction was why no one on my team should have to go through this in fact none of the 2000 engineers at Target should have to do this in fact why do we why is this even here and she said no one could really remember but there was some vague memory of something terrible that happened 16 years ago but the details have been lost and and so some months later they actually abolished the temp and larb right and I think that is like one of the markers of you know these people driving these transformations I want to actually have now learned how to actually more especially talk about this I've talked to a friend dr. Stephen Manor as part of this ph.d program she was studying transformational leadership and I asked him what's that he rattled off these characteristics and my jaw hit the floor when I heard them because in my mind it actually exactly verbalize the behaviors and values of scene these leaders so I just want to share with you what those are the first one is inspirational motivation they say can you articulate a clear vision inspire passion and get up and help get other people on board the second one is idealized influence can you be a role model set the example you know can you be a lifelong learner and encourage that around others the third is individualized consideration now can you coach others enable others keep lines of communication open recognize other contributors and the third one the fourth one I love there's no some intellectual stimulation you know can you static anyou callenge status quote you have this relentless need for improvement in other words just because we did it for 16 years the same way there's suddenly unaccept will now and I'm going to focus that's where I'm going to focus on how do you empower decision making and so forth we did this little experiment where we asked about a hundred people to take a mlq assessment and it turns out that and we found that the devil's enterprise community self-identify as transformational leaders and that we actually integrated that into the 2017 state of DevOps report and holy cow you know it's studying news and findings to come so with that why do I think this is important it's because I think over the years I've thought about DevOps a lot of ways in terms of what is the mission and I think I've settled on this IVC the analyst firm says there's about eight million developers on the planet eight million ops people on the planet and yeah at best right I think the wild optimists could say you know we're at point five percent adoption right you think all the unicorns you take the segments within the horses and so that says yeah we have 99 percent left to go and I think the real goal of that is how to get every one of those 16 million engineers to be as productive as if they were working at a Google and on our Facebook and there's no doubt that when we do that and we will unlock trillions of dollars of economic value per year right and that's not going to happen in the unicorns that's going to happen in every of the largest brands and every industry vertical so I think that's the mission at hand and yes many people still joke that the S it took five and a half years to get out but is out and again saying I'm proudest of is like the 48 case studies you know most of them are from large complex organizations and I thought was definitely worth the time that we took so if you're interested in 340 page excerpt of both the devil's handbook in the Phoenix project all the videos and slides from DevOps Enterprise are online if you want exciting white papers that we've been doing at the DevOps research investment at Dora and whole bunch of other stuff just send an email to real gene command send your slides calm subject line DevOps don't take a picture don't write down just send an email to real gingka messenger flies calm subtly devil so you'll get an automated response in a couple minutes so with that Dominica thank you so much and I guess I get to hand it over to Nell thank you [Applause]