Spark Meetup at Strata


Details
We will be having a meetup on Tuesday during Strata
6:30-7pm: Mingling
7-8:15pm: Tech talks
8:15-9pm: Mingling
Talk#1
Title: GraphFrames: DataFrame-based graphs for Apache Spark
Abstract:
GraphFrames bring the power of Apache Spark DataFrames to interactive analytics on graphs.
Expressive motif queries simplify pattern search in graphs, and DataFrame integration allows seamlessly mixing graph queries with Spark SQL and ML. By leveraging Catalyst and Tungsten, GraphFrames provide scalability and performance. Uniform language APIs expose the full functionality of GraphX to Java and Python users for the first time.
In this talk, the developers of the GraphFrames package will give an overview, a live demo, and a discussion of design decisions and future plans. This talk will be generally accessible, covering major improvements from GraphX and providing resources for getting started. A running example of analyzing flight delays will be used to explain the range of GraphFrame functionality: simple SQL and graph queries, motif finding, and powerful graph algorithms.
For experts, this talk will also include a few technical details on design decisions, the current implementation, and ongoing work on speed and performance optimizations.
Bio
Joseph Bradley is a Spark PMC member and MLlib maintainer, working as a Software Engineer at Databricks. Previously, he was a postdoc at UC Berkeley after receiving his Ph.D. in Machine Learning from Carnegie Mellon U. in 2013. His research included probabilistic graphical models, parallel sparse regression, and aggregation mechanisms for peer grading in MOOCs.
Talk #2
Livy: A Unified REST Web Service for Apache Spark
Livy is an open source (Apache License) REST web service that manages long running Spark contexts in your cluster. By utilizing Livy, clients can easily submit:
- Spark jobs programmatically using a thin client2. Spark code snippets that are compiled and run in the cluster3. Entire Spark applications as JARs
Livy effectively makes it possible to build both interactive web/mobile applications and multi-tenant notebooks.
In this talk, attendees will get a brief overview of Livy, its architecture, APIs, and future work, and learn how you can contribute!
Bio
Anand Iyer is a senior product manager at Cloudera. His primary areas of focus are platforms for real-time streaming, Apache Spark, and tools for data ingestion into the Hadoop platform. Before joining Cloudera, he worked as an engineer at LinkedIn, where he applied machine learning techniques to improve the relevance and personalization of LinkedIn’s Feed. He has extensive experience in leveraging big data platforms to deliver products that delight customers. He has a master’s in computer science from Stanford and a bachelor’s from the University of Arizona.

Spark Meetup at Strata