BASM @ Bloomberg in San Francisco


Details
Join us for an evening of Bay Area Apache Spark Meetup featuring tech-talks using Apache Spark from Bloomberg (https://www.bloomberg.com/) and Databricks (https://databricks.com/).
Thanks to Bloomberg (https://www.bloomberg.com/)for hosting and sponsoring this meetup.
Bloomberg Security Building requires that you must fill out this form if you RSVP:
https://goo.gl/forms/wjeDeg6HPLAeeXIC3
Agenda:
6:00 - 6:30 pm Mingling & Refreshments
6:30 - 6:35 pm Welcome opening remarks, announcements, acknowledgments, and introductions
6:35 - 7:15 pm Bloomberg Ilan Filonenko: Apache Spark on K8s and HDFS Security
7:20 - 8:00 pm Databricks Jules S. Damji: What’s New in Apache Spark 2.3 and Why Should You Care
8:00 - 8:30 pm Mingling
Tech-Talk 1: Apache Spark on K8s and HDFS Security
Abstract: There is growing interest in running Apache Spark natively on Kubernetes. lan Filonenko will explain the design idioms, architecture and internal mechanics of Spark orchestrations over Kubernetes. Since data for Spark analytics is often stored in HDFS, Ilan will also explain how to make Spark on Kubernetes work seamlessly with HDFS by addressing challenges such as data locality and security through the use of Kubernetes constructs such as secrets and RBAC rules
Bio: Ilan Filonenko, Software Engineer, Bloomberg
Ilan Filonenko is a member of the Data Science Infrastructure team at Bloomberg, where he has designed and implemented distributed systems at both the application and infrastructure level.
Previously, Ilan was an engineering consultant and technical lead in various startups and research divisions across multiple industry verticals, including medicine, hospitality, finance, and music. Ilan’s research has focused on algorithmic, software, and hardware techniques for high-performance machine learning, with a focus on optimizing stochastic algorithms such as stochastic gradient descent (SGD).
Tech-Talk 2: What’s New in Apache Spark 2.3 and Why Should You Care
Abstract: The Apache Spark 2.3 release marks a big step forward in speed, unification, and API support.
This talk will quickly walk through what’s new and how you can benefit from the upcoming improvements:
-
Continuous Processing in Structured Streaming.
-
PySpark support for vectorization, giving Python developers the ability to run native Python code fast.
-
Native Kubernetes support, marrying the best of container orchestration and distributed data processing.
Bio: Jules S. Damji is an Apache Spark Community and Developer Advocate at Databricks. He is a hands-on developer with over 15 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, LoudCloud/Opsware, VeriSign, ProQuest, and Hortonworks, building large-scale distributed systems.
He holds a B.Sc and M.Sc in Computer Science and MA in Political Advocacy and Communication from Oregon State University, Cal State, and Johns Hopkins University respectively.

BASM @ Bloomberg in San Francisco