

What we’re about
This is a group for Seattle/Eastside users interested in sharing knowledge about Apache Kafka - a high throughput distributed pub/sub messaging system. Apache Kafka is an extremely successful message queue for stream processing systems. The adoption of Kafka for near real time data processing has been increasing tremendously. The goal of this group is to bring together people of similar interests to discuss features, best practices for operations and deployment, and case studies of applications built using Kafka.
Sponsors
See allUpcoming events (1)
See all- IN-PERSON: Apache Kafka® x Apache Flink® MeetupGarvey Schubert Barer, Seattle, WA
Join us for an Apache Kafka® x Apache Flink® Meetup on Thursday, May 1st from 6:00pm hosted by Uber!
If you RSVP here, you don't also need to RSVP on the Seattle Flink Group.
📍Venue:
Uber
1191 2nd AveSeattle, WA 98101🗓 Agenda:
- 6:00pm: Doors Open/Welcome, Drinks
- 6:15pm - 7:00pm: Zhifeng Chen, Senior Staff Enginee, Uber & Si Lao, Staff Engineer, Uber
- 7:00pm - 7:45pm: David Anderson, Principal Software Practice Lead , Confluent
- 7:45pm - 8:30pm: Food, Additional Q&A, Networking
💡 First Speakers:
Zhifeng Chen, Senior Staff Enginee, Uber & Si Lao, Staff Engineer, UberTalk:
Kafka Cluster of ClusterAbstract:
Kafka powers real-time message pub/sub at Uber. Kafka reliability is mission critical to business, but we do have experienced some challenges.- Deployment caused failure
Bad deployment on any Kafka service including, broker, producer or consumer could cause delay even failure of message delivery. Sometimes, error is so tiny to detect until it rolls out to the entire cluster and causes disaster.
- Noisy neighbor issue
We run multi-tenant Kafka at Uber. There are producers, consumers of different users that share Kafka clusters. Sometimes abnormal activities of one user could impact another, we call it the noisy neighbor issue. A typical case is one user accidentally sending a large amount of non-production traffic and degrading critical production use case
To address these challenges and enhance Kafka’s reliability, we introduced Cluster of Cluster design and deployed it at Uber. Cluster of cluster aims to build the capability of physical isolation by various dimensions. With this solution we achieved- Canary isolation
Canary is defined as a small amount of production traffic. Isolated by the canary dimension, we divide Kafka cluster into the canary sub-cluster and the non-canary sub-cluster, and enforce isolation between canary/non-canary from producer, broker to consumer. Canary isolation enables early detection and mitigation of deployment-caused failures
- Tenant isolation
Isolated by the owner of Kafka topics, we divide Kafka clusters into sub clusters for different tenants. We are capable of minimizing noisy neighbor issues caused by the unexpected behavior of Kafka clients.
💡 Second Speaker:
David Anderson, Principal Software Practice Lead , ConfluentTalk:
Unlocking the Mysteries of Apache FlinkAbstract:
Apache Flink has grown to be a large, complex piece of software that does one thing extremely well: it supports a wide range of stream processing applications with difficult-to-satisfy demands for scalability, high performance, and fault tolerance, all while managing large amounts of application state.Flink owes its success to its adherence to some well-chosen design principles. But many software developers have never worked with a framework organized this way, and struggle to adapt their application ideas to the constraints imposed by Flink's architecture.
After helping thousands of developers get started with Flink, I've seen that once you learn to appreciate why Flink's APIs are organized the way they are, it becomes easier to relax and accept what its developers have intended, and to organize your applications accordingly.
The key to demystifying Apache Flink is to understand how the combination of stream processing plus application state has influenced its design and APIs. A framework that cares only about batch processing would be much simpler than Flink, and the same would be true for a stream processing framework without support for state.
Flink’s processing model is a good fit for two large families of applications: data pipelines handling analytical data, and event-driven applications handling operational events. I’ll begin by exploring why these use cases benefit from stream processing, and what kind of requirements they have for keeping state.
Then I’ll open up the hood, and explain how Flink's managed state is organized, and how this relates to the programming model exposed by its APIs. We'll also look at watermarking, which is a major source of complexity and confusion for new Flink developers. Watermarking epitomizes the requirement Flink has to manage application state in a way that doesn't explode as those applications run, potentially forever, on streams of data that may never end.
This talk will give you a mental model for understanding Apache Flink. You should come away with an understanding of how the concepts that govern the implementation of Flink's runtime have shaped the design of Flink's APIs.
Bio:
David Anderson
Apache Flink committer
Software Practice Lead, Confluent
@alpinegizmoDavid is part of the Developer Relations team at Confluent. Since discovering Apache Flink in 2016, he has helped countless companies get started with stream processing. Previously, he worked as a consulting data engineer, designing and building data pipelines for clients with a diverse set of use cases, including search engines, machine learning, and business analytics.
***
DISCLAIMER
We do not cater to individuals under the age of 21.
If you would like to speak/host a future meetup, please email community@confluent.io