Skip to content

Details

We united with Chicago Data Engineering meetup group (https://www.meetup.com/Chicago-Data-Engineering/) to host two technical deep-dive talks on Google Big Data and Apache Ignite for Spark. QuantumBlack has agreed to sponsor our February Meetup. Lots of networking, along with pizza and beer!

Please bring an ID to get into the building.

Agenda:
6:00pm - 6:30pm: Networking and snacks

6:30pm - 7:00pm: "How to Speed Up Spark SQL With In-Memory Computing Stack?" talk by Denis Magda

7:00pm - 7:30pm: "Google Big Table - Store data at scale" talk by Piyush Sanghi

7:30pm - 8:00pm: Q/A with Speakers and QuantumBlack

Talk #1: How to Speed Up Spark SQL With In-Memory Computing Stack?

With Spark SQL based on the Catalyst optimizer, we can query and join various data sources, including Hive, relational databases, Avro, and Parquet. Catalyst’s extensible design lets us add data source-specific rules to push down aggregations and filters execution into external storage systems. Such optimizations speed up Spark SQL operations significantly by reducing data shuffling between Spark workers and an external data source.

This talk aims to explain how Apache Ignite’s in-memory store and internal SQL engine were integrated into the Catalyst optimizer to accelerate real-time analytics workloads with a highly-performant in-memory computing stack. We’ll start from the basics showing how to gain a performance boost by merely running Spark and Ignite together. Next, we’ll dive into more sophisticated optimizations to achieve an order of magnitude increase.

Speaker Bio:
Denis Magda is an open-source enthusiast who started his journey in Sun Microsystems as a developer advocate and presently settled down at Apache Software Foundation in the roles of Apache Ignite committer and PMC member. He is an expert in distributed systems and platforms who actively contributes to Apache Ignite and helps companies to build successful open-source projects. You can be sure to come across Denis at conferences, workshops and other events sharing his knowledge about the open-source, community building, distributed systems.

Talk #2: Google Big Table - Store data at scale

Abstract:
With massive speeds at which data are collected, we need new ways of persisting data at scale. Enter Google Big Table, which provides sub 10ms latency and scales to peta-bytes. We will look at how Google Big Table scaling works and when to use it.

Speaker Bio:
Piyush is a Big Data Engineer and a recent graduate from University of Chicago. He has over 9 years of professional experience solving technical problems for the Finance Industry. Presently at TransUnion, Piyush is building a self serve analytics platform using Spark, Docker and Kubernetes. Prior to joining TransUnion, Piyush has worked at Bank of America and Accenture.

---
Sponsors:
This event is sponsored by QuantumBlack, a McKinsey Company.

Members are also interested in