Skip to content

Apache Spark in Four Parts

Photo of Brian Husted
Hosted By
Brian H.
Apache Spark in Four Parts

Details

Please join us for an exciting evening of discussion and an opportunity to meet some of the thought leaders behind Apache Spark! Databricks was founded out of the UC Berkeley AMPLab by the creators of Apache Spark and they will be presenting a four part talk. I look forward to meeting everyone!

Schedule:

5:30 - 6:00 - Networking, Light Appetizers

6:00 - 8:00 - Presentations

8:00 - 8:30 - Q/A and Networking

About the talks:

Part 1: A brief intro to Spark: how it provides great performance and why, plus how Spark fits into the Big Data landscape. Recent design patterns are emerging for how use cases integrate Spark with other OSS frameworks, and we'll explore some of the drivers for those patterns.

Part 2: Prof. Reza Zadeh from Stanford ICME will present an intro to MLlib, plus discussion of recent work on SVD and all-pairs similarity. For work on machine learning at scale, e.g., recommenders, anomaly detection, etc., this is a game-changer. We'll look at release 1.1 and what's ahead for ML support.

Part 3: A quick demo of the new Databricks Cloud product,

http://databricks.com/product

Cloud-based notebooks written in Python, SQL, or Scala allow for interactive analysis, visualization, and curation of data — big or small — along real-time collaborating with colleagues. These are backed by Spark clusters that can be provisioned and scaled dynamically for fault tolerance, security, and resource isolation right out of the box.

Part 4: We will also discuss the new Spark Developer Certificate program with O'Reilly Media, and reserve plenty of time for Q&A.

About the speakers:

Reza Zadeh is a consulting professor at Stanford within ICME, conducting research and teaching courses targeting doctorate students. He focuses on Discrete Applied Mathematics, Machine Learning Theory and Applications, and Large-Scale Distributed Computing. For fun, he flies planes as a private pilot, climbs rocks as a PCIA instructor, and runs.

http://stanford.edu/~rezab/

Holden Karau is a software engineer at Databricks, active in open source, and the author of "Learning Spark" (O'Reilly) and "Fast Data Processing with Spark" (Packt). Prior to Databricks she worked on a variety of search and classification problems at Google, Foursquare, Amazon. She has a Bachelors of Mathematics in Computer Science from the University of Waterloo.

http://www.holdenkarau.com/

Hossein Falaki is a software engineer at Databricks working on the next big thing. Prior to that he was a data scientist at Apple’s personal assistant, Siri. He graduated with Ph.D. in Computer Science from UCLA, where he was a member of the Center for Embedded Networked Sensing (CENS).

http://www.falaki.net/

Paco Nathan is an O'Reilly author ("Enterprise Data Workflows", "Just Enough Math"), Dir of Community Evangelism at Databricks, and an advisor at Amplify Partners. Paco has 30+ years in the tech industry, ranging from Bell Labs to early-stage start-ups. He enjoys cycling in the Santa Cruz Mountains and crafting the best limoncello in California.

http://liber118.com/pxn/

Photo of Distributed Computing Maryland group
Distributed Computing Maryland
See more events
The Hotel at Arundel Preserve
7795 Arundel Mills Boulevard · Hanover, MD