Apache Spark Tutorial with Paco Nathan


Details
This two-hour, hands-on introduction to Apache Spark includes Spark Core API, use of Spark Shell, Spark Streaming, Spark SQL, MLlib, GraphX, cloud-based notebooks and more.
The workshop features hands-on technical exercises to get up to speed using Spark for data exploration, analysis, and building Big Data applications. The exercises are intermixed with brief technical talks targeted at people who are new to Spark.
- How to write simple Spark applications * A brief history of Big Data * Where Spark fits in the open source landscape * Theory of operation on a cluster * Combining SQL, Machine Learning, and Streaming for unified workflows * Cloud-based notebooks for team collaboration and data visualization * Case Studies for large-scale production deployments * Active areas of Spark-related research * Community resources for further study.
6pm: people gather/mingle 6:30pm: workshop begins 8:30pm: shift to Q&A, more mingling, perhaps some extended demos.
[ prerequisites: ]
Some experience coding with Python, Scala, or SQL, as well as some familiarity with data analytics.
Bring a laptop with wifi and browser, and reasonably current hardware (+2GB RAM) * MacOSX, Windows, Linux — all work fine * make sure you do not have corporate security controls that prevent use of network * have Java JDK 6/7/8 installed * have Python 2.7 installed. Two hours of battery life. NB: do not install Spark with Homebrew or Cygwin.
[ bio: ]
Paco Nathan (http://www.oreilly.com/pub/au/1927), is a "player/coach" who has led innovative Data teams building large-scale apps for several years. Expertise in distributed systems, machine learning, functional programming, cloud computing. Paco is an O'Reilly (http://www.oreilly.com/pub/au/1927) author, Apache Spark (http://spark.apache.org/) open source evangelist with Databricks (http://databricks.com/), and an advisor for Amplify Partners (http://www.amplifypartners.com/) and GalvanizeU (http://www.galvanizeu.com/). He received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups. Cited in 2015 as one of the Top 30 People in Big Data and Analytics (http://www.kdnuggets.com/2015/02/top-30-people-big-data-analytics.html) by Innovation Enterprise.
Parking information: http://tinyurl.com/kkjq6qy

Apache Spark Tutorial with Paco Nathan