Apache Spark is one the most popular general purpose distributed systems in the past few years. Apache Spark has APIs in Scala, Java, Python and more recently a few different attempts to provide support for R, C#, and Julia. This talk looks at Apache Spark from a performance/scaling point of view and the work we need to do to be able to handle [...]