Skip to content

Spark Meetup at Strata

Photo of Scott Walent
Hosted By
Scott W.
Spark Meetup at Strata

Details

We will be having a meetup on Tuesday during Strata

6:30-7pm: Mingling
7-8:15pm: Tech talks
8:15-9pm: Mingling

Talk#1

Title: GraphFrames: DataFrame-based graphs for Apache Spark

Abstract:

GraphFrames bring the power of Apache Spark DataFrames to interactive analytics on graphs.

Expressive motif queries simplify pattern search in graphs, and DataFrame integration allows seamlessly mixing graph queries with Spark SQL and ML. By leveraging Catalyst and Tungsten, GraphFrames provide scalability and performance. Uniform language APIs expose the full functionality of GraphX to Java and Python users for the first time.

In this talk, the developers of the GraphFrames package will give an overview, a live demo, and a discussion of design decisions and future plans. This talk will be generally accessible, covering major improvements from GraphX and providing resources for getting started. A running example of analyzing flight delays will be used to explain the range of GraphFrame functionality: simple SQL and graph queries, motif finding, and powerful graph algorithms.

For experts, this talk will also include a few technical details on design decisions, the current implementation, and ongoing work on speed and performance optimizations.

Bio

Joseph Bradley is a Spark PMC member and MLlib maintainer, working as a Software Engineer at Databricks. Previously, he was a postdoc at UC Berkeley after receiving his Ph.D. in Machine Learning from Carnegie Mellon U. in 2013. His research included probabilistic graphical models, parallel sparse regression, and aggregation mechanisms for peer grading in MOOCs.

Talk #2

Livy: A Unified REST Web Service for Apache Spark

Livy is an open source (Apache License) REST web service that manages long running Spark contexts in your cluster. By utilizing Livy, clients can easily submit:

  1. Spark jobs programmatically using a thin client2. Spark code snippets that are compiled and run in the cluster3. Entire Spark applications as JARs

Livy effectively makes it possible to build both interactive web/mobile applications and multi-tenant notebooks.

In this talk, attendees will get a brief overview of Livy, its architecture, APIs, and future work, and learn how you can contribute!

Bio

Anand Iyer is a senior product manager at Cloudera. His primary areas of focus are platforms for real-time streaming, Apache Spark, and tools for data ingestion into the Hadoop platform. Before joining Cloudera, he worked as an engineer at LinkedIn, where he applied machine learning techniques to improve the relevance and personalization of LinkedIn’s Feed. He has extensive experience in leveraging big data platforms to deliver products that delight customers. He has a master’s in computer science from Stanford and a bachelor’s from the University of Arizona.

Photo of Bay Area Spark Meetup group
Bay Area Spark Meetup
See more events
San Jose Convention Center Room 210A/E
150 W. San Carlos · San Jose, CA