Nessie 1.0 - Data Ops for Data Lakes

Jan 25, 2022 · Mountain View, United States of America

Join us for the release of Nessie 1.0. Haven't heard about Nessie? Then read on:

Nessie is to Data Lakes what Git is to source code repositories.

Nessie is designed to give Data Lake users an always-consistent view of their data across their data sets (tables), no matter where they are. Changes to your data, for example from batch jobs, happen independently and are completely isolated. Users will not see any incomplete changes. Once all the changes are done, all the changes can be atomically and consistently applied and become visible to your users.

Nessie completely eliminates the hard and often manual work required to keep track of the individual data files. Nessie knows which data files are being used and which data files can safely be deleted.

Production, staging and development environments can use the same data lake without risking the consistent state of production data.

Nessie does not copy your data, instead it references the existing data, which works fine, because data files1 are immutable.

Speaker: Ryan Murray, Co-creator of Project Nessie and OSS Engineer, Dremio

About the Speaker: Ryan Murray is an open source Engineering Lead at Dremio. He previously served in the financial services industry doing everything from bond trader to data engineering lead. Ryan holds a PhD in theoretical physics and is an active open source contributor who dislikes it when data isn’t accessible in an organization. He is passionate about making customers successful and self-sufficient, and still one day dreams of winning the Stanley Cup.

Agenda:
4:00 pm Welcome & Introductions
4:10 pm "Introduction to Nessie 1.0" by Ryan Murray
5:00 pm Q&A

Discussion - We’ve created a Slack which you can use during and after Subsurface LIVE.

Event organizers

Are you organizing Nessie 1.0 - Data Ops for Data Lakes?

Claim the event and start manage its content.

I am the organizer
Social

Featured Events