From Matei Zaharia:
"As big data becomes a concern for more organizations, there is a need for both faster tools to process it and easier-to-use APIs. Apache Spark is a Hadoop-compatible cluster computing engine that addresses these needs through (1) in-memory computing primitives that let it run 100x faster than Hadoop and (2) high-level APIs in Scala, Java and Python. In the past few years, Spark has quickly grown to be one of the most active projects in the big data space, with over 25 companies contributing, and a developer community second in size only to Hadoop. This talk will introduce the Spark programming model and API, show you how to get started using it, and talk about use cases in the community. Finally, we’ll cover the growing stack of higher-level tools built on top of Spark, including Spark Streaming for real-time processing, Shark for SQL, GraphX, and MLlib."
Pizza and drinks will be provided.
About the speaker:
Matei Zaharia is the creator of Apache Spark and is joining MIT CSAIL as an assistant professor next year. He recently completed his PhD at UC Berkeley, during which he worked closely with the open source big data ecosystem, becoming a committer on Apache Mesos and Hadoop. He is currently on leave to start Databricks, a company built around Spark, where he is CTO.