Meetup At SAP


Details
6:30-7 Mingling -refreshments will be served
7-7:05 Introductions
7:05-8:30 Tech Talks
8:30-8:45 Mingling
Talk 1:
Contextual Awareness with SAP HANA Vora
A big challenge for large enterprise companies is working across different data domains. A tight integration between the query processing in relational database management systems and a big data framework such as Spark is mandatory for exploiting the full value of the stored data. This talk describes how SAP HANA Vora bridges the gap between different data domains.
Bio:
Christian Tinnefeld is a research manager in the SAP HANA Vora team in Palo Alto, CA, US. Before joining SAP, he received his B.Sc. and M.Sc. degrees from the Hasso Plattner Institute at the University of Potsdam, Germany, where he also pursued his doctoral studies. His main research interests are In-Memory Databases and Cloud Computing.
Talk 2:
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Apache Spark 2.0 will ship with the second generation Tungsten engine. Building upon ideas from modern compilers and MPP databases, and applying them to data processing queries, we have started an ongoing effort to dramatically improve Spark’s performance and bringing execution closer to bare metal. In this talk, we’ll take a deep dive into Apache Spark 2.0’s execution engine and discuss a number of architectural changes around whole-stage code generation/vectorization that have been instrumental in improving CPU efficiency and gaining performance.
Bio:
Sameer Agarwal is a Software Engineer at Databricks working on Spark core and Spark SQL. Previously, he received his PhD in Databases from UC Berkeley AMPLab where he worked on BlinkDB, an approximate query engine for Spark.
Talk 3:
Apache Spark(tm) has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do I deploy these model to a production environment. How do I embed what I have learned into customer facing data applications. Like all things in engineering, it depends.
In this meetup, we will discuss best practices from Databricks on how our customers productionize machine learning models and do a deep dive with actual customer case studies and live demos of a few example architectures and code in Python and Scala. We will also briefly touch on what is coming in Apache Spark 2.X with model serialization and scoring options.
Bio:
Our speaker today will be Richard Garris. Richard is a Sr. Solution Architect at Databricks focused on Advanced Analytics and Machine Learning. He was a Solution Architect with Skytree, the Machine Learning Company and spent time at Google, Twitter and PricewaterhouseCoopers as an Advanced Analytics Consultant.

Meetup At SAP