Game Analytics' Migration to Apache Druid + Performance Optimizations & Roadmap

London Apache Druid Meetup by Imply
London Apache Druid Meetup by Imply
Public group

Holden House

57 Rathbone Pl · London

How to find us

Let the security desk know you're headed to the Druid meetup on the 4th Floor, at eOffice. Once you get off the elevator, look for the eOffice door and ring the doorbell!

Location image of event venue

Details

Meetup Agenda:
6pm – 6:30pm: Pizza and networking
6:30 – 7pm: Talk #1 from Ramón Lastres Guerrero, Game Analytics
7 – 7:40pm: Talk #2 Benjamin Hopp, Imply
7:40 – 8pm: Q&A and wrap up

Talk #1: At GameAnalytics we receive and process real time behavioural data from more than 100 million daily active users, helping thousands of game studios and developers improve their games. In this talk, you will learn how we managed to migrate our backend systems from using an in-house built streaming analytics service to Apache Druid, and the lessons learned along the way. By working with Imply and adopting Druid, we have been able to reduce development costs, increase reliability of our systems and implement new features that would have not been possible with our old stack.

Bio: Ramón Lastres Guerrero has a background in backend development and distributed systems, and has worked for five years as a consultant specialised in the Erlang programming language (also Elixir). He spent some of those years working on payments systems (Skrill and Vocalink) and joined GameAnalytics around two years ago as a member of the backend team. It was his first time working with big data, and he’s been a Druid user since then. At the moment he manages the engineering efforts at GameAnalytics, where there’s a team of 15 engineers.

Talk #2: Druid is an emerging standard in the data infrastructure world, designed for high-performance slice-and-dice analytics (“OLAP”-style) on large data sets. This talk is for you if you’re interested in learning more about pushing Druid’s analytical performance to the limit. Perhaps you’re already running Druid and are looking to speed up your deployment, or perhaps you aren’t familiar with Druid and are interested in learning the basics. Some of the tips in this talk are Druid-specific, but many of them will apply to any operational analytics technology stack.

The most important contributor to a fast analytical setup is getting the data model right. The talk will center around various choices you can make to prepare your data to get best possible query performance.

We’ll look at some general best practices to model your data before ingestion such as OLAP dimensional modeling (called “roll-up” in Druid), data partitioning, and tips for choosing column types and indexes. We’ll also look at how more can be less: often, storing copies of your data partitioned, sorted, or aggregated in different ways can speed up queries by reducing the amount of computation needed.

We’ll also look at Druid-specific optimizations that take advantage of approximations; where you can trade accuracy for performance and reduced storage. You’ll get introduced to Druid’s features for approximate counting, set operations, ranking, quantiles, and more. And we will finish with the latest and greatest Druid news, including details about the latest roadmap and releases.

Bio: Benjamin Hopp has been involved in architecting big data and streaming data solutions for companies of all sizes. Currently, he is a Solutions Architect with Imply where he assists organizations to deploy and manage Apache Druid solutions. Previously, he worked as a Senior Systems Architect with Hortonworks specializing in streaming data use-cases using HDF and Apache NiFi.