Big data systems for operational analytics


Details
Important Note: It is required to register for the event (free) on ti.to, before the event. You will then be sent an eNDA which needs to be signed 24 hours before the event, for security reasons. A badge would be pre-printed for you when you arrive at the event. Please register here. If for some reason you are not able to sign the eNDA online, you can still attend, however you may have a wait in a long line at the sign in desk.
Ti.to link: https://ti.to/big-data/big-data-systems-for-operational-analytics/with/vryrkr9c46c
Talk #1: The rise of operational analytic data stores
Abstract:
Operational analytic data stores are a new emerging class of databases that merges ideas of logsearch systems (Elastic, Splunk, etc) and traditional analytic databases (Vertica, Teradata, etc). Popular open source projects in this class include Apache Druid (incubating), Clickhouse (from Yandex), Pinot (from LI), Palo (from Baidu), and more. We will discuss the motivation behind these databases, and discuss in the detail the history, architecture, and future of Druid.
Bio:
Gian Merlino is an Apache Druid (incubating) PMC member and a co-founder of Imply. Previously, Gian led the data ingestion team at Metamarkets and held senior engineering positions at Yahoo. He holds a BS in Computer Science from Caltech.
Talk #2: Data modeling tradeoffs with Druid
Abstract:
When dealing with an in-memory database and large volumes of data, limiting infrastructure costs can quickly becomes a concern. Luckily there are many data modeling techniques and Druid functionalities that can be used to mitigate costs. Between summarization techniques, leveraging sketches, sampling data and more, methods can be combined to achieve desired results while staying within reasonable cost boundaries. In this talk, we'll explore how we can do more with less, and describe a methodology to limit Druid data source sizes while delivering reliable, fast analytics.
Bio:
Maxime Beauchemin works as a Senior Software Engineer at Lyft where he develops open source products that reduce friction and help generate insights from data. He is the creator and a lead maintainer of Apache Airflow [incubating], a data pipeline workflow engine; and Apache Superset [incubating], a data visualization platform; and is recognized as a thought leader in the data engineering field.
Before Lyft, Maxime worked at Airbnb on the "Analytics & Experimentation Products team". Previously, he worked at Facebook on computation frameworks powering engagement and growth analytics, on clickstream analytics at Yahoo!, and as a data warehouse architect at Ubisoft.
Talk #3: Streaming SQL and Druid
Druid provides sub-second query latency and Flink provides SQL on streams allowing rich transformation/enrichment of events as it happens. In this talk we will learn how Lyft
uses flink sql and Druid together to support real time analytics.
Bio:
Arup Malakar: Arup is a Software Engineer at Lyft, working on the Data Platform team. Prior to Lyft, Arup had helped build the data platform at Ooyala and Yahoo!
Agenda:
6:00 - 6:30 pm: Check in and settle, networking
6:30 - 6:35 pm: Intros
6:35 - 7:10 pm - Talk #1
7:15 - 7:50 pm - Talk #2
7:55 - 8:30 pm - Talk #3
8:30 - 8:45 pm - Wrap up

Big data systems for operational analytics