Real-time Analytics with Apache Druid at Fullcontact

This is a past event

58 people went

Location image of event venue

Details

Speaker Bios

Jeremy Plichta is a Director of Engineering at FullContact where he helps lead the DevOps, Foundations/Integrations, and Application Security Teams. He has worked with several technologies like Hadoop, Spark, and Kafka in the past and helped launch Apache Druid into the FullContact ecosystem to help solve API usage and aggregation problem. When he isn’t working he enjoys spending time with his wife and 3 kids, reading great sci-fi books, working out and snowboarding.

Janis Dancis is a Sr Software Engineer at FullContact where he is focused on building systems to capture and analyze our high volume API usage data, provide tooling to integrate this data into our accounting, invoicing and other back office systems, and developing front end applications that our customers use to interact with our platform. His favorite JVM technology is Clojure, which he is still trying to use to build a rocketship at FullContact. He gets his thrills outside of the office by racing rally cars up steep hills and taking tight corners.

Gian Merlino is a co-founder and the CTO of Imply, a San Francisco based technology company. Gian is also one of the main committers of Druid. Previously, Gian led the data ingestion team at Metamarkets and held senior engineering positions at Yahoo. He holds a BS in Computer Science from Caltech.

FullContact Talk Summary

FullContact is building one of the leading identity resolution as a service platforms to help brands and businesses connect to their customers on a more personal level. Doing this means keeping track of billions of different identity resolution events that occur through both API and Batch. When going to the whiteboard to build a system that could track all of this FullContact came up with a streaming pipeline architecture that flowed all usage into both S3 and Druid. This new system has offered immense flexibility to scale and give customers near real time insight into their API usage patterns. In this talk Janis and Jeremy will discuss what this pipeline looks like at a high level, some interesting problems they had to solve along the way and ideas on other Druid features they really should be leveraging to make the whole thing even better!

Imply Talk Summary

The dirty secret of most “streaming analytics” technologies is that they are just stream processors: they sit on a stream and continuously compute the results of a particular query. They’re good for alerting, keeping a dashboard up-to-date in real time, and streaming ETL, but they’re not good at powering apps that give you true insight into what is happening: for this, you need the ability to explore, slice/dice, drill down, and search into the data. This talk will cover the current state of the streaming analytics world and what Apache Druid, a real-time analytical database, brings to the table.

*** Schedule ***

6:00 - 6:30 -- People shuffle in, get food and beverage and talk
6:30 - 6:50 -- Jeremy and Janis, FullContact, with Q&A
7:00 - 7:45 -- Gian, Imply, with Q&A
8:00 - 8:30 -- Druid roadmap discussion, wrap up