Please note we're meeting in a different room at UST. McNeely Hall (MCH), Room 100, Cleveland and Summit.
Abstract: This session will describe Kudu, the new addition to the open source Hadoop ecosystem that complements HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API.
Over the past several years, the Apache Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap with traditional database technologies. With systems such as Apache Impala (incubating) and Apache Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems such as Apache HBase and Apache Phoenix, applications can achieve millisecond-scale random access to arbitrarily-sized datasets.
Despite these advances, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing workloads.
This talk will investigate the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals, and how Apache Kudu solves many of these challenges.
Speaker: Todd Lipcon has been a software engineer at Cloudera since early 2009, working on various parts of the open source Apache Hadoop ecosystem. From 2009 to 2012, he focused on Apache HBase, HDFS, and MapReduce, where he designed and implemented redundant metadata storage for the NameNode (QuorumJournalManager), ZooKeeper-based automatic failover, and numerous performance, durability, and stability improvements. In 2012, Todd founded the Apache Kudu project and has spent the last three years leading this team. Todd is a committer and PMC member on Apache HBase, Hadoop, Thrift, and Kudu, as well as a Member of the Apache Software Foundation.
Parking: Anderson Ramp (pay)
Map from Parking to Event: http://bit.ly/2fiw1QE
Food: Pizza and drinks, first come first serve, starting at 6:30PM provided by Cloudera.
Claim the event and start manage its content.I am the organizer