Spark, Scala, and the Berkeley Data Analytics Stack.

This is a past event

136 people went

Location visible to members


IMPORTANT Please register at SkillsMatter: (

Spark, Scala, and the Berkeley Data Analytics Stack.


Patrick Wendell (

This talk will introduce Apache Spark ( Spark is a cluster computing engine that lets users concisely express a wide range of applications through APIs in Scala, Java and Python. Under the hood, Spark is written primarily in Scala. Spark supports streaming, batch and interactive analytics on very large datasets. Due to its support for in-memory storage and general operator graphs, it can run 100x faster than Hadoop for complex algorithms such as machine learning and graph processing.

This talk will give an overview of Spark and provide reflections on writing a large production application in Scala. Spark has spawned a variety of related projects which will also be covered briefly, including a SQL execution engine (Shark (, a graph computing library (GraphX (, and a machine learning library (MLLib (

Patrick Wendell is a committer on Apache Spark ( and a co-founder of Databricks. Before Databricks, he was pursuing a Ph.D in the UC Berkeley AMPLab advised by Ion Stoica. His research focused on scalable low latency scheduling for data processing frameworks. In the past, he has contributed to several Hadoop projects, including Apache Flume and Apache Avro. He holds a B.S. in Computer Science from Princeton University and an M.S. in Computer Science from UC Berkeley.

We will, as always, also be heading to the Slaughtered Lamb ( pub afterwards.


Skills Matter are hosting this event and are handling the attendance it is essential that you confirm your place at this link:

failure to do so may result in not obtaining a seat. Please register on the "I'm going" to only let the others in the group know your going.

If this is your first time to SkillsMatter, directions are: