Why take this tutorial?
This tutorial is for anyone who needs to know about big data - what it means and how to work with it (versión en español).During this half-day session (1:30pm - 5:30pm ART) you will find out answers to questions about big data in general, and specific technologies used to solve big data problems, including:
Map-reduce with Hadoop
Workflows with Cascading and Hive
NoSQL storage with Cassandra and HBase
Real time processing with Storm
This training will draw on the instructor's extensive real world experience with Apple, Groupon, and many Bay area startups to not only explain what these technologies do, but more importantly how and when it makes sense to use them.The presentation will be in English, with printouts in Spanish for all content. Full price is US$150, with a limited number of early bird discounted tickets available (see above).
Instructor Biography
Ken Krugler has been using Hadoop since its very beginning six years ago, and is an active architect, developer, and entrepreneur in the Big Data space. His current company (Scale Unlimited) provides training and consulting on Big Data, search and machine learning projects for companies both big and small. He regularly speaks at Hadoop Summits, Strata conferences, and user group events across the US and in Europe.
Who Should Attend?
This tutorial is appropriate for developers, managers, architects, or anyone who wants or needs to learn more about the complex and rapidly changing world of big data solutions. Prior knowledge of Hadoop, Cassandra, and other big data technologies is not required.
Course Outline
What exactly is big data?
Volume, velocity, variety
Data exhaust now is data gold
How to work with big data
New technologies to the rescue
Hadoop as foundation
Hadoop eco-system
Cloud-based servers
Essential big data skills
Manager/Executive
Developer
Map-reduce with Hadoop
Storage and execution at scale
Hadoop Distributed File System (HDFS)
Hadoop Map-reduce
When does Hadoop make sense?
Workflows with Cascading and Hive
High-level data processing abstractions
Using Cascading for complex ETL problems
Using Hive for ad hoc queries
NoSQL storage with Cassandra and HBase
The reasoning behind the NoSQL movement
Modeling your data in a flat world
Cassandra in 10 minutes or less
HBase in a nutshell
Solr as a NoSQL solution
Real-time processing with Storm
What to do when batch processing isn't acceptable
How Storm provides scaling and reliability
Challenges with creating Storm-based workflows