Spark 2.0 Performance Improvements & Blazegraph GPU


Details
Overview
Please join us for an awesome evening of networking and interactive talks! Databricks (https://databricks.com/) will present the latest performance improvements in the Spark 2.0 release, and Blazegraph (https://www.blazegraph.com/) will present how to use their GPU and DASL product to perform graph and machine learning (ML) analytics at a large scale with Spark and GPU clusters. I hope to see everyone there!
Please Note This is a privately funded event and Recruiting is NOT allowed
Meetup Agenda:
5:00 – 6:00 – Live DJ, Networking, Happy Hour, Pizza
6:00 – 7:00 – Databricks presents Performance Improvements in Spark 2.0
7:00 – 8:30 - Blazegraph presents GPU and DASL with ML and Spark
Meetup Talks:
Performance Improvements in Apache Spark 2.0
Presented by Markus Dale at Databricks.
In this talk we will explore the latest Project Tungsten updates to Spark, which further improve Spark performance through whole-stage code generation and vectorization. We will show how to take advantage of these improvements by using Datasets and DataFrames.
GPU-Acceleration Graph Analytics with Apache Spark
No CUDA required! Learn how to use Blazegraph GPU and DASL to perform graph and machine learning (ML) analytics at a large scale with Spark and GPU clusters. Blazegraph GPU provides enables acceleration of SPARQL graph queries with 200-300X speed-up. Blazegraph DASL (“dazzle”) is an analytics platform that combines the ease of Spark with the speed of CUDA; up to a 1000X faster than Spark without GPUs. It provides operators in the Scala programming language for graphs and machine learning algorithms expressed as programs and workflows over linear algebra primitives. DASL is translated into task graphs that expose the available parallelism. The Blazegraph DASL mapgraph runtime evaluates the task graphs and provides a distributed execution environment, scalable on GPUs and GPU clusters. We will present and demonstrate DASL running with Apache Spark running PageRank over a 140 million Netflows.
Attendees will learn how:
● To apply DASL and linear algebra techniques for graph algorithms
● Process their data in Spark Data Frames and load it into the DASL Runtime.
● Execute an example DASL pagerank and extend and customize it.
Speaker Bios:
Markus Dale of Databricks
Markus Dale is a solutions architect with Databricks where he helps customers solve big data problems with managed Spark in a collaborative, cloud-based environment. He previously served as a senior software developer with DoD where he focused on large scale data processing in Hadoop and Spark. He also developed and taught an Hadoop for developers class for UMBC Training. He has presented on Hadoop, Spark and AWS at meetups in MD and VA. His blog and earlier presentations can be found at http://uebercomputing.com.
Brad Bebee
Brad Bebee is the CEO of Blazegraph, leading efforts to deliver graphs at scale with Blazegraph technologies. An expert in graphs and large-scale analytics, his background includes software development, telecommunications, and information retrieval. He has implemented large scale analytics using Hadoop and Accumulo. He is leading the integration of Blazegraph’s GPU technologies for graph analytics into business and mission applications.
Jim Carbonaro
Jim Carbonaro is subject matter expert for integration and scaling of Blazegraph solutions with real-time analytic processing frameworks, including Spark, Scala, Storm, Kafka, GraphX, and Redis. He is a lead developer of DASL and DASL algorithms for large-scale graph analytics. He led recent work to compare performance of Apache Spark GraphX with Blazegraph-accelerated graph analytics on the analysis of Netflow data.
James Lewis
James Lewis is a CUDA researcher with SYSTAP. He is the lead developer for Blazegraph GPU. He wrote the initial version of the software that uses SpMV techniques to implement Sparql Query evaluation on the GPU. He was the lead CUDA developer for integrating Mapgraph technology with the Merlin Application to accelerate Electronic Warfare using GPU graph capabilities. In this role, James exposed the graph capabilities on the GPU via a Java Native Interface (JNI) to enable the integration without the application developer writing any CUDA, C++, of non-Java code. He studied at the University of Utah Scientific Computing Institute (SCI) where he received B.S. degrees in both Computer Science and Applied Mathematics as a well as an M.S. in Computing. In his research work, James developed graph topological metrics to evaluate the performance of aggregation method in the context of multigrid coarsening. He implemented parallel aggregation techniques for multigrid coarsening in C++ and CUDA.

Spark 2.0 Performance Improvements & Blazegraph GPU