ZetaVM, A Platform To Enable Programming Language Innovation

5 0

[Music] [Music] I was always fascinated by machines ever since I was a kid the first time I saw a telephone and a television I had to ask my mom you know how how does telephone work how the how does the TV work and I found out very quickly that people have generally speaking no idea how these things work and I just couldn't understand I had to know I had to find out I got very interested in electronics and I wanted to build my own devices and play with them an experiment but this was kind of problematic because I was raised by a single mom on welfare that we were very poor and electronics is a very expensive hobby you constantly had to buy expensive parts parts which were even more expensive in the 1990s but eventually I discovered computers we had an old created a sex at home and it had qbasic on it and I started to discover the world of programming and I discovered to my amazement the computers were sort of the ultimate machines there's an infinite war with possibilities to what you can build with programming and you're not limited by parts or build world quantities you know the only limit is your imagination and at the time you're killing to put into this and yeah I started programming really seriously when I was about 15 I wanted to make games like a lot of teenagers and I started to learn C++ because that's what was using in a game industry and I was never focused enough to actually complete any of my game development projects at the time but I discovered that I love programming for its own sake as a as a creative outlet and I think if you program enough very quickly you start to run into limitations most mainstream languages have a number of words and very quickly you know you want to express something and you find that the languages you're using don't really have the concept necessary to express what you're thinking and I think it's kind of natural as a programmer you get to a point where you would like to maybe create your own programming language I wanted to do that too I wanted to understand how programming languages are made and so eventually in university I went on to study compiler design and to get a PhD in compiler design and during my PhD I learned how how to make a mean compiler but I realized that I don't really know what what my ultimate language should be like programming language design is actually really hard and I think it takes a lot of experimentation to really build a good programming language so today I'm going to tell you about video VM which is sort of my contribution to the area of programming language design so I decided I decided I wasn't going to create the ultimate programming language instead I was going to create my ultimate virtual machine to build programming languages on so the VM is a VM for dynamic programming languages some of its primary goals are to make language creation creation and experimentation more accessible accessible to people with only a basic knowledge of programming it's also my attempt to explore a few risky ideas in virtual machine design to do things that maybe fly in the face of commonly accepted wisdom today to try and shake things up and I don't know if theta VM will ever catch on and become a mainstream piece of software but I think at the very least if I can demonstrate that some of these ideas work well it's going to inspire other systems and some of the ideas will spread to other implementations so some mobile features of Zeta VM it has a text based image file format it has built-in support for dynamic typing it has first-class buy codes and optimizing interpreter and it has multi language supports built-in so I'm going to explain all these things in this presentation so the basic data types supported by Zeta VM are boolean integers floats objects it has an object model that resembles JavaScript and the reason I made that choice is because I think this object model that's used by JavaScript is also similar to the object model that's used by PHP and Python in a lot of other languages and having support for it at the VM level makes it very easy to implement languages that fit sort of in that family of languages that are like like Python and in JavaScript in Lua and and so on and everything else in the VM is is built using these basic types by composing them together okay you might be wondering why dynamic typing maybe you're someone who really likes strongly typed statically typed languages and you hate dynamic typing and you're why why would you even do that well I think that the reason is that if you have support for dynamic typing at the VM level you can still very easily implement the statically typed language that will run on Zeta VM and is going to perform well but if the VM does not have support for dynamic typing and you want to implement dynamic a dynamically typed language on it it's going to be really difficult and it's going to perform poorly so texture image format what do I mean by this well Zeta VM takes some inspiration from small talk small talk is a seminal programming language that's influenced the design of many other languages and it had this concept of an image which is you could suspend the execution of a program and store the entire heap all the objects all the memory put into a file on disk and then you could restore that and resume the execution of a program where you had left off so theta has a textual file format which is sort of its native way of representing programs packages and and data on disk it's a format that resembles JSON with the difference that allows circular references so it really represents a graph of objects that can have circular references to to one another and this can be used to represent both code and plain data the reason I chose to go with a textual format as opposed to a binary format is largely because I wanted to make the VM more accessible for people who want to implement their own language and who maybe wouldn't feel comfortable with just having to generate a binary format and in a sense I also think that a textual format is more future-proof it's easier to extend later on so this is an example of what later image file might look like so here we have a definition of a some function and we're just giving it a name it has a list of parameters x and y and it also has inside a list of by code instructions and the bytecode instructions are described using objects here we're getting the two local variables x and y and then we're adding them but in a stack machine so avoiding and then returning a result and at the end of our package we're just producing an object which is basically the list of symbols that are exported by this package so here we're exporting the sum function I mentioned first class bytecode what do I mean by this so this video takes some inspiration from from Lisp also in the code is data sort of sense so theta functions are objects but not as in JavaScript so in JavaScript functions sort of pretend to be object JavaScript pretend that everything is an object when it's not but in theta functions literally are objects as in their negative objects so what this means is you can generate code on the fly if you compose together objects and they have the right fields and everything you can build a function you can build byte code in memory and then you can call this as a function and you can also go and poke into the code of existing functions and and possibly modify it and introspect it on the fly so this is actually a really powerful feature so just a quick example of what you can do with this up you can implement functional style partial evaluation or currying as as a bytecode modification sort of so hopefully this is not too difficult to understand into the function that curry is another function so it takes as argument to the function and the value that you want to do recurring with for the last function arguments and what it does is just it creates a new entry points for the curried function and in this entry point we're going to push on the stack the value of the arguments that we want to curve it we're going to assign this argument to the correct local variable and then we jump to the existing entry point for the function so we're just creating a new function put a new entry points that does the currying and that jumps to the existing function entry point and this is sort of an example of what you can do with this this is for for sound generation it's a function that's produces a sine wave as a function of time so there's a function f in there that takes time and frequency and it returns the amplitude of the sine wave and here we can carry the sine function in function of the frequency so you can you can call sign with a frequency like 300 and it's going to return a new function that only takes the time and produces a 300 Hertz sine wave so you might be wondering why would you want to use currying to do that can't you use closures to do this anyways well the point is you can implement the new language on Zeta VM that doesn't even have closures and I could Curie functions in your language without knowing anything about this implementation if it runs on Zeta VM you can do currying without having to care about the details of how it's implemented and moving on the the GT reporter so veda has an interpreter internally it doesn't yet generate machine codes so there's this kind of a problem right because I presented this this bytecode format that's all made out of objects in memory you can create by code by creating objects and that looks like it's probably really inefficient right like if you write an interpreter that just traverses a graph of objects with all these proxy lookups all this interaction it's going to be dog slow also the solution I found to this problem was to create an interpreter that lazily translates the object based by code into a flat's compact internal representation so basically it's an interpreter that does some amount of JIT compilation and some amount of optimization at the same time and this delivers a speed of about 25 times over the naive interpreter implementation which is that's enough to do some some fun things like audio signal generation it's getting to a point where it's nearly as fast as C Python and I think that put in next few months is going to to become very soon much faster as part of Zeta VM I also implemented sort of a toy language to run on this VM this language is called plush it's inspired by JavaScript and Lua and the main goals of this language were to bootstrap and test the system and also to serve as an implementation example for people who come to Zeta VM and who are wondering how to implement the language on this I wanted to have a simple language that people could look at what an implementation that's really well commented so that people can just modify this implementation if they want to there's currently two implementations of the plush language there's one called C plush which is written in C++ and it takes plush source files as inputs and produces those data image files as output so it pushes like a compiled package and there's also a self-hosted plush implementation which is itself written in plush and this is the plush language package so I'll explain in a moment why this is useful or interesting so data has support for multiple languages natively by delegating the parsing of source code to language packages so this means you can add support for your language to the virtual machine in a way that feels native by implementing a language package so when you write a source file and you wanted to run on Zeta you begin your source file by having a language line which tells ADA which parser to load to parse your source file and data will call into this package to parse your code so the reason that there's both a C++ plush implementation and a self-hosted package is that the C plush compiler was basically needed to bootstrap the system the C plush compiler was needed so that we could have a way to write code on this system without directly writing the the byte code by hand so this is an example of a little Plus source file that has a Fibonacci function and the first line in the file basically is the the language line which sells plush in order to parse this source file you will need to load the plush package and then so Veda loads the plush package and it passes the source for this file to this package and this will parse the source code generates objects in memory that represents the intermediate representation for this code and then executes that on the VM so the point is if you write your own language package you could make your own language for this VM that's going to feel native in this sort of way so I've presented the things that are already implemented in as a WM so far I also have some some pretty ambitious plans for what I want to do next where I want to go next with this virtual machine some of these ideas might actually seem pretty out there but I think they're actually fairly realistic so theta is going to have a package manager what's going to be maybe different from other package managers is that it's going to be integrated into the VM itself so packages will be downloaded automatically I wonder needed without you needing to to tell the package manager to go fetch the package and the reason why this is particularly interesting is because the the killer feature of Zeta VM so to speak will be that you can make your new language accessible to everyone instantly so you can write you will be able to write a language package further the VM and then push it to the package manager and as soon as you push it you can tell your friends you know my package is at this address and they can start writing code in your language and run it on Zeta VM in a way that will feel native so to speak another interesting aspect is that the packages will be versioned and immutable so that means once you push a package to to the package manager it will have an address and that address will be so if you push a new version of the package and we'll have a new version number and so on and revisions for this or that packages that you depend on cannot change under you that makes it less likely for for code to break and the reason I'm talking about that is because I have an ambitious goal with this project which is to explore ways to to mitigate the phenomenon that's known as code rot I think code rot is actually one of the biggest problems in software development today it's something that's kind of accepted as normal which is kind of strange in a sense basically code breaks constantly even if you don't touch a piece of code if you write a piece of code now and you try to run this piece of code it's possible in a couple of months it's not going to work anymore because some of the dependencies will have changed it's if you try to run a piece of code that was written 10 15 or 20 years ago especially like a piece of C code or it's quite possible that it's not going to work even though that piece of code hasn't changed at all it's now it's broken and people accept that this is normal which i think is really unfortunate because it means that a lot of code just goes to waste it has to be thrown away and constantly rewritten so we're wasting a huge number of millions of hours of work in this way and I believe that the problem is that the foundations are constantly changing under your feet your code might not change but everything else that your code depends on is changing so how could we tackle code Rock could we possibly design programs so that we will keep working correctly in 20 30 or 50 years well the first observation I'll make is that there is a class of systems where code rot is not a problem which you're already familiar with and this is emulators basically so if you take a program that runs on a Super Nintendo and you run it now it's still going to work correctly and you could you could have a Super Nintendo emulator that will run those programs and it will provide it it follows the dispatch exactly those programs are still going to work correctly and the same goes for like old Atari programs and old Commodore 64 programs and the reason that those programs still works correctly of course is because the Super Nintendo and a Commodore 64 are not changing yeah so maybe that's also a little crazy right because the general problem is a little more complicated than that because as soon as your program starts interacting with the outside world you can't guarantee that the outside world is not going to change but I do think we can design systems to at least reduce the probability that code will break so the approach that I plan to take in Zeta is is to be intentionally minimalistic I think if Zeta remains smaller and simpler if it provides less things that we can make sure that what it does provide is better tested that it has less corner cases and so let's change of less chances of breaking and Zeta will intentionally try to avoid undefined behaviors the C programming language has a lot of undefined behaviors and it hasn't defined behaviors because they believe that that gives them a performance edge in Zeta we will make the opposite choice we will say if we find an undefined behavior we will try to define the specification to make it disappear we will pick reliability over tiny performance games and the immutable version package manager' also fits into this approach it's a more functional approach to software development where when you import a package you explicitly specify I want version 15 of this package and you know the version 15 is not going to change and you know that this package is going to rely on teachers of the VM that are frozen and so there's much less chance that it's going to break of course you can still if you interact with the outside world if you interact with some external web API then I can't promise that that external Web API is not going to change but I can at least you know give you a chance for writing a piece of soft that will still work in the future and functional graphics this is another thing that I would like to explore with Zeta Zeta will basically take a functional approach to to the N 3d graphics so it is sort of like back to the 1980s instead of having polygons later will do per pixel graphics so plotting individual pixels which probably sounds completely crazy because it's lies in the faces the way everybody else does 2d and 3d graphics right now but I think I think I can make this work fast the way that I plan to make this work fast is to provide sort of a parallel map operator where you you pass a function that's a function of the coordinates x and y and your function will return the color of a pixel at that position basically and Zeta will try to automatically paralyze this function and run it on the GPU essentially so basically it's a pixel shader so later we'll do graphics using pixel shaders and the inspiration for this is a look at the demo scene scientists in fields if you are not familiar with that I encourage you to check it out when you have time it's possible to do some pretty amazing graphics in function in terms of sine distance fields and and run on a GPU nowadays and yeah this is very very different form from from Java for instance but the the way in which this approach really shines I think is because data only needs to provide one function so it only needs to provide this operator where you pass it a function that gives the color of a pixel and that is the Zeta graphics interface basically so it's really simple and it's much less likely to break than say Java's approach where Java exposes a billion different UI widgets which basically don't operate the same on every platform so far so I have a VM with a simple optimizing interpreter I've implemented some basic file IO audio and graphics API s we have some beginnings of a standard library there's a C++ implementation of plush there's a self-hosted implementation I currently work and my friend Marco has implemented another language called espresso which is implemented in Python the roadmap where I want to go next I would like to complete the plush bootstrap using serialization so basically the the C plush implementation is going to be deprecated soon and only the self hosted implementation will remain I need to employ a garbage collector so the awm does not even have a garbage collector right now it just allocates memory until either your program terminates or things blow up I would like to implement inlining at the interpreter level so the the interpreter itself will do inlining which I think should give it about a 10x speed up theirs eventually going to be a JIT compiler and also at the end the execution on GPUs probably much later so I realized that there's a bit of irony in the talk that I just gave because I'm talking about eliminating code rot but at the same time the project that I'm presenting at this point is really young and so it's going to change a lot so if you write code for Zeta VM now then for sure it's going to break almost guaranteed but I think this is normal I think before I can get to the point where the byte code format that Zeta runs is frozen there needs to be a prototyping stage there needs to be experimentation in order to figure out what works well and then stabilize things so there will come a point where the the bytecode format will be frozen and the core API is provided by the VM will be mostly unchanging but that point is not there quite yet so in conclusion the AVM is a young open-source project to create a VM with kind of a different take on computing platforms the hope is that this VM will allow you to create a language of your own with very little code to instantly publish your language and make it available to everyone and the goal of the CM is also to explore ideas and sort of will really try things that nobody else dares to try that people in the industry don't dare to try because they're a little bit too out there and I hope that one day this VM will suffer a rich ecosystem of languages and I am looking for contributors if you're interested in collaborating on this project so that's it if you're interested you can check it out on github you can follow me on twitter and then also like to give a shout out a big thank you to all the people who have contributed to the project so far [Applause] [Music] [Music] you