SciPy + MapReduce with Disco

0 0

MapReduce has become one of two dominant paradigms in distributed computing (along with MPI). Yet many times, implementing an algorithm as a MapReduce job - especially in Python - forces us to sacrifice efficiency (BLAS routines, etc.) in favor of data parallelism.,In my work, which involves writing distributed learning algorithms for processing terabytes of Twitter data at SocialFlow, I've come to advocate a form of "vectorized MapReduce" which integrates efficient numerical libraries like numpy/scipy into