Skip to content

Scale Out and Optimize Spark 2.0

Photo of Brian Husted
Hosted By
Brian H.
Scale Out and Optimize Spark 2.0

Details

Overview

Please join us for an awesome evening of networking and interactive talks! Databricks (https://databricks.com/) will present a code heavy talk on partitioning, shuffling, caching, and enhancements in Spark 2.0 and FedCentric (http://www.fedcentric.com/)will present using Apache Spark in a Scale-Up UV environment with details on how they tuned and optimized Spark. I hope to see everyone there to get the most out of your Spark applications!

Please Note This is a privately funded event and Recruiting is NOT allowed

Meetup Agenda:

5:00 – 6:00 – Live DJ, Networking, Happy Hour, Pizza

6:00 – 7:00 – Databricks Optimizing Spark 2.0 Applications

7:00 – 8:00 - FedCentric Spark in a Scale-Up UV Environment

Meetup Talks:

Databricks Presents: Optimizing your Apache Spark 2.0 Applications

This talk will cover some important core topics related to stability and performance of Apache Spark applications such as partitioning, shuffling, caching, and enhancements in Spark 2.0. We'll cover some considerations for migrating code from Spark 1.x to 2.0, as well as how to best take advantage of the performance provided by the Catalyst Optimizer. This will be a code-heavy talk (no slides) so come prepared!

FedCentric Presents: Spark in a Scale-Up UV Environment

The release of Spark 2.0 has opened up new territory for performing rapid and complex queries on data beyond what a normal Hadoop environment can offer-- giving developers the ability to both scale out and up. In this talk, we will focus on the performance gains to be found from using Apache Spark in a Scale-Up UV environment as well as our experiences with fine-tuning Spark to get these results.

Speaker Bios:

Silvio Fiorito is a Solutions Architect with Databricks based out of Northern VA helping customers build and deploy their Apache Spark applications. He's worked as a developer and application security engineer in the DC area for almost 20 years using a variety of tools and languages in both the private and federal sectors. He's been using Spark for about 4 years (since v0.6 when it was still an AMPLab project) for digital marketing, security, finance, and other use cases. He was one of the first Databricks certified trainers on the east coast and has presented multiple times at local meetups. He's contributed to the DataStax Spark Cassandra connector as well as Apache Zeppelin and developed a Power BI connector for easily visualizing data from Spark Streaming.

John Purtilo is a Project Manager at FedCentric Technologies LLC. His specialization is software engineering, particularly how it applies to in-memory and graph technologies in Cyber Security. Previously, John worked at the United States Naval Research Laboratory on medical imaging technologies. He is a graduate of the University of Maryland, College Park where he holds a degree in Computer Science with Honors in addition to the Gemstone Honors Citation

Photo of Distributed Computing Maryland group
Distributed Computing Maryland
See more events
Jailbreak Brewing Company
9445 Washington Blvd N Ste F · Laurel, MD