Big Data Tutorial

Nov 29, 2012 · Recoleta, Argentina
Why take this tutorial?
This tutorial is for anyone who needs to know about big data - what it means and how to work with it (versión en español).During this half-day session (1:30pm - 5:30pm ART) you will find out answers to questions about big data in general, and specific technologies used to solve big data problems, including:

Map-reduce with Hadoop
Workflows with Cascading and Hive
NoSQL storage with Cassandra and HBase
Real time processing with Storm

This training will draw on the instructor's extensive real world experience with Apple, Groupon, and many Bay area startups to not only explain what these technologies do, but more importantly how and when it makes sense to use them.The presentation will be in English, with printouts in Spanish for all content. Full price is US$150, with a limited number of early bird discounted tickets available (see above).
Instructor Biography

Ken Krugler has been using Hadoop since its very beginning six years ago, and is an active architect, developer, and entrepreneur in the Big Data space. His current company (Scale Unlimited) provides training and consulting on Big Data, search and machine learning projects for companies both big and small. He regularly speaks at Hadoop Summits, Strata conferences, and user group events across the US and in Europe.

Who Should Attend?
This tutorial is appropriate for developers, managers, architects, or anyone who wants or needs to learn more about the complex and rapidly changing world of big data solutions. Prior knowledge of Hadoop, Cassandra, and other big data technologies is not required.
Course Outline

What exactly is big data?

Volume, velocity, variety

Data exhaust now is data gold

How to work with big data

New technologies to the rescue
Hadoop as foundation
Hadoop eco-system
Cloud-based servers

Essential big data skills


Map-reduce with Hadoop

Storage and execution at scale

Hadoop Distributed File System (HDFS)

Hadoop Map-reduce
When does Hadoop make sense?

Workflows with Cascading and Hive

High-level data processing abstractions
Using Cascading for complex ETL problems
Using Hive for ad hoc queries

NoSQL storage with Cassandra and HBase

The reasoning behind the NoSQL movement
Modeling your data in a flat world
Cassandra in 10 minutes or less
HBase in a nutshell
Solr as a NoSQL solution

Real-time processing with Storm

What to do when batch processing isn't acceptable
How Storm provides scaling and reliability
Challenges with creating Storm-based workflows
