Lightning-fast SQL Queries + Transactions directly on the Data Lake

Oct 25, 2021 · Sunnyvale, United States of America

SCHEDULE (times in PST, in the San Francisco bay area):
6:50 Join Zoom, register here to receive zoom link:

7:00 SFbayACM intro, upcoming events, introduce the speaker

7:10 presentation starts (~60 min with Q&A)
8:10 - 8:30 wrap up

Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of more and more data. A key capability needed to enable more users is the ability to hide the complexity of underlying data structures and physical data storage. The de-facto standard has been the Hive table format, released by Facebook in 2009 that addresses some of these problems, but falls short at data, user, and application scale. So what is the answer? Apache Iceberg. Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.

In this talk you will learn:
* The issues that arise when using the Hive table format at scale, and why we need a new table format
* How a straightforward, elegant change in table format structure has enormous positive effects
* The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
* The resulting benefits of this architectural design

Apache Arrow, Apache Iceberg, Project Nessie and other Dremio technologies such as Data Reflections work together to speed up queries by up to 1,000x

Jason Hughes is a Technical Director at Dremio. Previously, he was a Senior Solutions Architect at Dremio, and before that spent time in multiple roles at Teradata, including being the pre-sales and post-sales lead for Presto and Teradata QueryGrid for the Americas and in product management in Teradata's Analytical Ecosystem. Prior to that, he developed, deployed, and operated a custom CRM system for multiple auto dealerships. He is passionate about making customers and individuals successful and self-sufficient.

The open source projects involved are:
- Apache Iceberg -
- Apache Arrow
- Project Nessie -
- Dremio (see attached)

Event organizers

Are you organizing Lightning-fast SQL Queries + Transactions directly on the Data Lake?

Claim the event and start manage its content.

I am the organizer

based on 0 reviews