Past Meetup

Druid Bay Area Meetup @ Lyft

This Meetup is past

130 people went

Location image of event venue

Details

*** Notes ***

We are hosting this meetup together with the SF Big Analytics meetup group (https://www.meetup.com/SF-Big-Analytics/events/252678379/).

Lyft is asking all attendees to register for the event on ti.to (free) before arriving to the event. After registration, an eNDA will sent to you and after you sign the NDA, a badge will be pre-printed for you when you arrive at the event. If for some reason you are not able to sign the eNDA online, you can still attend, however you may have a wait in line to sign in at the front desk.

Ti.to link: https://ti.to/big-data/big-data-systems-for-operational-analytics/with/vryrkr9c46c

*** Presentations ***

Talk 1: The rise of operational analytic data stores

Abstract:
Operational analytic data stores are a new emerging class of databases that merges ideas of logsearch systems (Elastic, Splunk, etc) and traditional analytic databases (Vertica, Teradata, etc). Popular open source projects in this class include Apache Druid (incubating), Clickhouse (from Yandex), Pinot (from LI), Palo (from Baidu), and more. We will discuss the motivation behind these databases, and discuss in the detail the history, architecture, and future of Druid.

Speaker: Gian Merlino
Gian is an Apache Druid (incubating) PMC member and a co-founder of Imply. Previously, Gian led the data ingestion team at Metamarkets and held senior engineering positions at Yahoo. He holds a BS in Computer Science from Caltech.
------------------------------------------------------------------------------------------------------
Talk 2: Data modeling tradeoffs with Druid (Lyft)

Abstract:
When dealing with an in-memory database and large volumes of data, limiting infrastructure costs can quickly becomes a concern. Luckily there are many data modeling techniques and Druid functionalities that can be used to mitigate costs. Between summarization techniques, leveraging sketches, sampling data and more, methods can be combined to achieve desired results while staying within reasonable cost boundaries. In this talk, we'll explore how we can do more with less, and describe a methodology to limit Druid data source sizes while delivering reliable, fast analytics.

Speaker: Maxime Beauchemin
Maxime Beauchemin works as a Senior Software Engineer at Lyft where he develops open source products that reduce friction and help generate insights from data. He is the creator and a lead maintainer of Apache Airflow [incubating], a data pipeline workflow engine; and Apache Superset [incubating], a data visualization platform; and is recognized as a thought leader in the data engineering field.

Before Lyft, Maxime worked at Airbnb on the "Analytics & Experimentation Products team". Previously, he worked at Facebook on computation frameworks powering engagement and growth analytics, on clickstream analytics at Yahoo!, and as a data warehouse architect at Ubisoft.
------------------------------------------------------------------------------------------------------
Talk 3: Streaming SQL and Druid (Lyft)

Abstract:
Druid provides sub-second query latency and Flink provides SQL on streams allowing rich transformation/enrichment of events as it happens. In this talk we will learn how Lyft
uses flink sql and druid together to support real time analytics.

Speaker: Arup Malakar
Arup is a Software Engineer at Lyft, working on the Data Platform team. Prior to Lyft, Arup had helped build the data platform at Ooyala and Yahoo!
------------------------------------------------------------------------------------------------------

*** Schedule ***

6:00 - 6:30 pm: Check in and settle, networking
6:30 - 6:35 pm: Intros
6:35 - 7:10 pm - Talk #1
7:15 - 7:50 pm - Talk #2
7:55 - 8:30 pm - Talk #3
8:30 - 8:45 pm - Wrap up