CDC and Building a Streaming Analytics Stack with Kafka and Druid

Are you going?

130 people going

Share:

E*Trade

4500 Bohannon Drive · Menlo Park, CA

How to find us

Please provide your full name below and bring ID with you! https://docs.google.com/forms/d/e/1FAIpQLScolWPmTPbB1LtdLGts3cFmFlytoUVVUyMeUVrfJDx9ZXuFGQ/viewform?fbzx=8738248503310828350

Location image of event venue

Details

***************** FOR ENTRY *******************
PLEASE MAKE SURE YOU RSVP WITH YOUR FULL NAME TO ENSURE ENTRANCE TO THE BUILDING BELOW AS PER SECURITY REQUEST FROM THE HOSTS.

Join us for an Apache Kafka meetup on January 22nd from 6pm, hosted by E*Trade in Menlo Park. The address, agenda and speaker information can be found below. See you there!

Enter your full name here and then RSVP via meetup.com: https://docs.google.com/forms/d/e/1FAIpQLScolWPmTPbB1LtdLGts3cFmFlytoUVVUyMeUVrfJDx9ZXuFGQ/viewform?fbzx=8738248503310828350

Thank you!

-----

Agenda:
6:30pm: Doors open
6:30pm - 7:00pm: Networking, Pizza and Drinks
7:00pm - 7:30pm: Presentation #1: Stop letting your data rest and make it stream, Samer Abraham, E*Trade
7:30pm - 8:15pm: Presentation #2: Building a Streaming Analytics Stack with Kafka and Druid, Gian Merlino, Imply
8:15 - 8.45pm: Additional Q&A and Networking

-----

First Talk

Speaker:
Samer Abraham
VP of Software Engineering, E*TRADE

Bio:
Samer Abraham is a Vice President of Software Engineering at E*TRADE Financial, the original online financial services disruptor. Samer currently manages E*Trade's Investor suite of applications, including roboadvisors and retirement planning tools. Samer has a passion for data and has been helping to lead the push towards streaming data using Kafka at E*TRADE. Previously, Samer led web development and data analytics teams at Morgan Stanley. When not streaming, Samer is usually smoking ie cooking traditional American barbecue.

Title:
Stop letting your data rest and make it stream

Abstract:
Data is everywhere. However, it isn't always where you want it to be.

Most of us attending this Meetup want our data flowing through a streaming platform, namely Kafka. But outside of the data you own and produce, most applications have a need for content from various data sources which may not be streaming data at all. While at first this resonates true only for brownfield projects, those working on greenfield projects often find the need to consume data from legacy or 3rd party systems.

In this talk, we will explore E*Trade's journey and learnings to take our diverse sets of data sources and distribute them via streams where industry standard protocols and content delivery mechanisms bring interesting challenges to the table.

---

Second Talk

Speaker:
Gian Merlino

Bio:
Gian is a cofounder and CTO of Imply, a San Francisco based technology company. Gian is also one of the main committers of Druid. Previously, Gian led the data ingestion team at Metamarkets (now a part of Snapchat) and held senior engineering positions at Yahoo. He holds a BS in Computer Science from Caltech.

Title:
Building a Streaming Analytics Stack with Kafka and Druid

Abstract:
The maturation and development of open source technologies has made it easier than ever for companies to derive insights from vast quantities of data. In this talk, we will cover how data analytic stacks have evolved from data warehouses, to data lakes, and to more modern stream-oriented analytic stacks. We will also discuss building such a stack using Apache Kafka and Apache Druid.

Analytics pipelines running purely on Hadoop can suffer from hours of data lag. Initial attempts to solve this problem often lead to inflexible solutions, where the queries must be known ahead of time, or fragile solutions where the integrity of the data cannot be assured. Combining Hadoop with Kafka and Druid can guarantee system availability, maintain data integrity, and support fast and flexible queries.

In the described system, Kafka provides a fast message bus and is the delivery point for machine-generated event streams. Kafka streams can be used to manipulated data to load into Druid. Druid provides flexible, highly available, low-latency queries.

This talk is based on our real-world experiences building out such a stack for many use cases across many industries in the real world.