Spark overview and PySpark demo

Name: Spark overview and PySpark demo
Start: 2013-10-07T18:00:00-07:00
End: 2013-10-07T20:15:00-07:00
Location: Twilio HQ

Hosted By Code and Data

public group

Details

What is Apache Spark? (http://spark.incubator.apache.org/)

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

To make programming faster, Spark provides clean, concise APIs in Python (http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-job-in-python), Scala (http://www.scala-lang.org) and Java (http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-job-in-java). You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets.

Josh Rosen from UC Berkely AMPLab will provide a big-picture overview of Spark coupled with a live demo of PySpark on an EC2 cluster. At the AMP Camp, Fernando Perez wrote a tutorial on accessing PySpark through IPython notebook ( http://nbviewer.ipython.org/6384491/00-Setup-IPython-PySpark.ipynb ), on which the demo will be based.

See you there!

Events in San Francisco, CA

Code and Data

See more events

Code and Data

Monday, October 7, 2013
6:00 PM to 8:15 PM PDT

Twilio HQ

645 Harrison St. 3rd Floor · San Francisco, CA

Code and Data

public group

Spark overview and PySpark demo