Spark Summit committer night


Details
Spark Committer night!Food. Drink. Spark! The tradition continues. We'll have several speakers delving into Spark with us and hope to see you there. -Matt & Francois
Reynold Xin, Spark PMC member and Chief Architect for Spark at Databricks, will give an introduction, followed by 3 talks:
Nong Li from Databricks will present Spark Performance: What's Next
Ram Sriharsha from Hortonworks will present Magellan: Spark as a Geospatial Analytics Engine
Chris Fregly from IBM will present Real-time, Advanced Analytics and Recommendations using Machine Learning, Graph Processing, Natural Language Processing, and Approximations
Details
Nong Li from Databricks will present Spark Performance: What's Next
As part of the Tungsten project, Spark has started an ongoing effort to dramatically improve performance to bring the execution closer to bare metal. In this talk, we’ll go over the progress that has been made so far and the areas we’re looking to invest in next. This talk will discuss the architectural changes that are being made as well as some discussion into how Spark users can expect their application to benefit from this effort. The focus of the talk will be on Spark SQL but the improvements are general and applicable to multiple Spark technologies.
Ram Sriharsha from Hortonworks will present Magellan: Spark as a Geospatial Analytics Engine
Suppose you have a large volume of point in space data (think mobile GPS coordinates). You want to join this dataset with shapes (be it neighborhoods in New York boroughs, the road system in NYC, the canal systems in Amsterdam, what have you). How do you do this join at scale? Our goal in this talk is to show how we are solving this problem using Magellan and Spark. Magellan is a newly open sourced geospatial analytics engine written on top of Spark and is the first such engine to deeply leverage Spark SQL, Dataframes and Catalyst for spacial analytics. This talk will focus on one specific aspect of Magellan: how does Magellan implement Spatial Joins, and where does it leverage Spark SQL for efficiency and simplicity? The talk should be of interest to developers who wish to understand how to leverage Spark SQL in richer ways than before, those interested in writing specialized analytics engines on top of Spark SQL, and Data Scientists and Data Engineers who wish to perform spatial analytics processing or predictive analytics on geospatial datasets at scale.
Chris Fregly from IBM will present Real-time, Advanced Analytics and Recommendations using Machine Learning, Graph Processing, Natural Language Processing, and ApproximationsBONUS: Netflix Recommendations: Then and Now
Agenda
Mingling from 6:30-7 and again from 8-9Talks begin@7
Bios
Reynold Xin, Spark PMC member and Chief Architect for Spark at Databricks, will give an introduction, followed by 3 talks.
Nong Li is a software engineer working at Databricks on Spark core and Spark SQL with a focus on performance-related work. Prior to Databricks, Nong worked at Cloudera on the Impala project on the core execution engine and was the tech lead of the Record Service project.
Ram Sriharsha is a Senior Member of Technical Staff at Hortonworks, focused on Spark, Machine Learning, and Data Science. Ram is an Apache Spark Committer and PMC Member. Prior to joining Hortonworks, he was Principal Research Scientist at Yahoo Research where he worked on large scale machine learning algorithms and systems related to login risk detection, sponsored search advertising, and advertising effectiveness measurement.
Chris Fregly is a Principal Data Solutions Engineer for the newly-formed IBM Spark Technology Center, an Apache Spark Contributor, and a Netflix Open Source Committer. He is also the founder of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark @ advancedspark.com. Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.

Spark Summit committer night