mrjob: Snakes on a Hadoop

0 0

This tutorial will take participants through basic usage of mrjob by writing analytics jobs over Yelp data. mrjob lets you easily write, run, and test distributed batch jobs in Python, on top of Hadoop. Hadoop is a MapReduce platform for processing big data but requires a fair amount of Java boilerplate. mrjob is an open source Python library written by Yelp used to process TBs of data every day.

PyCon US 2014

PyCon is the largest annual gathering for the community using and developing the open-source Python programming language. It is produced and underwritten by the Python Software Foundation, the 501(...