Kafka Palooza: LinkedIn, Microsoft Azure, MapR


Details
Hope all of you had a great summer! We are back with the next meetup session, with multiple exciting talks. The agenda is packed, with the headliner being a talk by Todd Palino from LinkedIn. Microsoft is hosting the session in Bellevue once again, at City Center (Room 2130-2150). Please RSVP and spread the word. Hope to see you all there!
There is parking available in Level A of the City Center building, and we will have a number of free parking validations to hand out.
Agenda
6:00 pm: Doors Open
6:15 - 7:00 pm: LinkedIn talk
7:00 - 7:15 pm: Azure HDInsight talk
7:15 - 7:30 pm: Siphon talk
7:30 - 8:00 pm: MapR talk
- Apache Kafka at LinkedIn - "Multi-Tier, Multi-Tenant, Multi-Problem Kafka"
Speaker: Todd Palino, LinkedIn
Abstract: At LinkedIn, the Kafka infrastructure is run as a service: the Streaming team develops and deploys Kafka, but is not the producer or consumer of the data that flows through it. With multiple datacenters, and numerous applications sharing these clusters, we have developed an architecture with multiple pipelines and multiple tiers. Most days, this works out well, but it has led to many interesting problems. Over the years we have worked to develop a number of solutions, most of them open source, to make it possible for us to reliably handle over a trillion messages a day.
Todd Palino is a Staff Site Reliability Engineer at LinkedIn, tasked with keeping Zookeeper, Kafka, and Samza deployments fed and watered. He is responsible for architecture, day-to-day operations, and tools development, including the creation of an advanced monitoring and notification system. Previously, Todd was a Systems Engineer at Verisign, developing service management automation for DNS, networking, and hardware management, as well as managing hardware and software standards across the company.
In his spare time, Todd is the developer of the open source project Burrow, a Kafka consumer monitoring tool, and can be found sharing his experience on Apache Kafka at industry conferences and tech talks. He is also in the middle of co-authoring Kafka: The Definitive Guide, soon to be available from O’Reilly Media. When that’s not keeping him busy, you’ll find him out on the trails, training for his next marathon.
- Apache Kafka on Azure HDInsight
Speaker: Raghav Mohan, Program Manager for Azure Big Data.
At Microsoft, we have run Kafka workloads at scale via on-premise solutions. Recently, we have onboarded certain Kafka workloads to HDinsight - a fully managed cloud service powered by Azure. We will detail the challenges faced for creating a managed cloud Kafka service, and the obstacles faced for moving Kafka workloads on-prem hosts to cloud services.
- Automating partition management in Kafka
Speaker: Som Sahu, Microsoft.
If you run a Kafka cluster in Production environment, you may already be familiar with the imbalance in Partition distribution among Kafka brokers over time as disks, machines and new topics are added to the cluster. Som will talk about an automatic way to detect and fix the imbalance by distributing Kafka partitions evenly across Kafka brokers. This is a proven approach in Microsoft Kafka Cluster that could bring down operational overhead significantly.
- MapR Streams and Kafka
Speaker: Will Ochandarena
Abstract: This presentation explores real-time event streaming with Kafka and MapR Streams. We’ll start with the basics, look at a few real-world use cases, and then deep-dive on how these concepts can be extended to build a next-generation system of record. In doing so, we’ll talk about the relationship of streams to databases, which are historically thought of as the system of record, and talk about how some really hard data management problems are solved using this approach, such as synchronization of multi-model databases, data versioning, and data lineage auditing..
Will Ochandarena is Senior Director of Product Management at MapR, where he is responsible for streams and cross-platform services like containers, clouds, security, and user experience. Before entering the big data space he spent several years at Cisco managing data center switching products. He has an engineering degree from Rensselaer Polytechnic Institute and an MBA from Santa Clara University.

Sponsors
Kafka Palooza: LinkedIn, Microsoft Azure, MapR