Title: 100s of billions of interconnected profiles
Audience Manager backend global profile store is maintained in a large HBase cluster, where each profile is modeled as an entry in a table.
This denormalized form is the right modeling for profiles as they are populated independently (data is captured at the profile level) and each profile has its own lifecycle (retention, etc)
Previously it supported only profile level processing which fits well the data model.
Though, with the advance of the business requirements we need to be able to "merge" multiple profiles and apply business rules (e.g. segmentation rules) over a set of connected devices (e.g. all devices for a person, all devices in a household)
This talk presents the journey of evolving the backend processing to tackle this profile merge over 100s of billions of interconnected profiles that needs to:
* to be able to traverse all newly created segments rules on a regular basis
* to be cost effective
* to scale (be able to reprocess the entire graph (100s of billions of devices) in a reasonable amount of time (hours)
Tags: Adobe Audience Segmentation and Management, Apache Spark, HBase, AWS EMR
Title: Processing 8B events per day in realtime: patterns & techniques
With data at the core of the modern business, there’s an ever-growing need to process never-ending streams of changing data in real time. But designing & operating stream processing applications is not an easy task. This talk will cover some of the challenges of building reliable, distributed stream processing using Edge Profile as a case study.
Edge Profile is a distributed lightweight subset of the profile data found in a central location, housed on edge locations. It processes profile updates and distributes them to multiple edges across the world — in the US, Europe and Asia. At its core, a streaming application written on top of Kafka Streams processes over 8B profile updates per day in real time.
This talk will walk through the patterns used to design & build this streaming application, such as relying on a cluster of machines which share nothing. In the process, it will introduce Kafka Streams, a lightweight library that applications can embed for expressing stream processing operations. It will then discuss the techniques used to tweak the application for high throughput, including how to cater for slow consumers that risk to diminish the performance or to gracefully handle bursts in traffic. Finally, it will present some of the lessons we learned along the way, such as monitoring from day one all the way through.
As per a standard format, the agenda will be as follows:
6.30pm - Arrive,
7.00pm-8.00pm - Talks,
8.00pm-09.00pm - Drinks, Pizzas