Scalding: powerful and concise MapReduce programming*
In this talk we introduce scalding, an open-source Scala DSL for Apache Hadoop.
Scala is a functional programming language on the JVM. Hadoop uses a functional programming model to represent large-scale distributed computation. Scala is thus a very natural match for Hadoop. Twitter uses Scalding for data analysis and machine learning, particularly in cases where we need more than sql-like queries on the logs, for instance fitting models and matrix processing. It scales beautifully from simple, grep-like jobs all the way up to jobs with hundreds of map-reduce pairs. This talk will present the Scalding DSL and show some example jobs for common use cases.
I have given a few talks as conferences while doing grad school. By the time this conference takes place, I will have given this talk quite a few times.
Argyris is a data scientist at the revenue quality team at Twitter. Previously he was a cofounder of AdGrok. He originally came to the US to do a PhD at Stanford.
When not geeking out with data, Argyris enjoys DJing at KZSU 90.1FM.