5:30 - 6:00 -- Drinks, nibbles and chatting
6:00 - 7:00 -- Neal Glew, talking about the Apache Beam Project (bio and abstract below)
7:00 - 7:30 -- Drinks, nibbles and chatting
Neal Glew is a software engineer in the Flume project at Google, where he mostly works on the shuffle system. He previously worked at Intel on parallel programming models within Intel Labs. He has a PhD in computer science from Cornell University and a BSc(hons) in computer science from Victoria University of Wellington.
Apache Beam (https://beam.apache.org/) is an open-source project for writing big-data pipelines.
In the first part of this talk, I'll describe Beam from a non-technical perspective - what it is, why you would use it, how it compares to other technologies in the big data space.
In the second half of the talk I will go into a high-level overview of the technical aspects of Beam. In particular, its heart is a programming model that unifies both batch and stream processing, allowing the programmer to separate the what, where, when, and how of processing. What actual processing is performed on the data. Where in event time is that processing done - how are event times windowed. When in processing time to materialise results. How are updates of results (due e.g. to late data) combined. Beam also provides several language-specific SDKs that instantiate the model for particular languages. Currently Java and Python are available and Go is under development.
Beam also provides a portability framework that allows pipelines to be run on a variety of execution technologies. Beam itself provides a reference runner. There are also efforts to develop runners based on Apache Flink and Apache Spark. Google provides a commercial managed runner on its Google Cloud. Beam builds on the work of Map Reduce, Hadoop, Flume, Spark, and Flink.
Claim the event and start manage its content.I am the organizer