The good, the bad and the ugly of ZooKeeper / Analytics on write with AWS Lambda


Details
The good, the bad, and the ugly of Apache ZooKeeper
Implementing primitives for distributed coordination such as locks, barriers, and election is inherently difficult. Apache ZooKeeper (https://zookeeper.apache.org/) is a system designed precisely to enable the coordination of processes in a distributed system in a very general and simple manner. It exposes a interface that renders the task of implementing such primitives much simpler. At its core, it solves a difficult problem widely known as distributed consensus. Solving consensus is important because, from a theory perspective, it is impossible to solve some of these problems if consensus is not implemented somewhere. The combination of a simple API with consensus at the core makes ZooKeeper an attractive element for the design of many systems. ZooKeeper has been used in production at scale for many years and has been battle-tested across a number of companies.
In this talk, we cover the basics of ZooKeeper, architecture and API, and some of the experience we obtained by running it in production with a number of applications. This experience includes not only the success stories, but the use cases that are not a good fit, and design choices that we found to be far from ideal.
Bio: Flavio Junqueira is a member of the technical staff at Confluent (http://www.confluent.io/). Previously, he held research positions with Microsoft Research and Yahoo! Research. He holds a PhD degree in Computer Science from the University of California in San Diego, and his expertise is in the space of distributed computing. He has made a number of contributions in this space, both academic and of practical relevance, such as publications, including an O'Reilly book on Apache ZooKeeper (http://shop.oreilly.com/product/0636920028901.do), and creation of open-source projects. He is an active contributor to open-source projects such as Apache ZooKeeper, Apache BookKeeper, and Apache Kafka.
Analytics on write with AWS Lambda
Analytics on write is a four-step process:
-
Read our events from our event stream
-
Analyze our events using a stream processing framework
-
Write the summarized output of our analysis to some form of storage target
-
Serve the summarized output into real-time dashboards, reports and similar
We call this analytics on write because we are performing the analysis portion of our work prior to writing to our storage target; you can think of this as early or “eager” analysis, whereas analytics on read using Redshift or similar is late or “lazy” analysis.
In this talk, Alex will take us through the basic principles of analytics on write, and then walk through a worked example of analytics on write using AWS Lambda (https://aws.amazon.com/lambda/), Amazon Kinesis and DynamoDB.
Bio: Alex is co-founder of Snowplow Analytics (http://snowplowanalytics.com/) and the author of Unified Log Processing (https://www.manning.com/books/unified-log-processing) from Manning.

The good, the bad and the ugly of ZooKeeper / Analytics on write with AWS Lambda