Dapper & Distributed Tracing, presented by Dan Kuebrich

This is a past event

27 people went

Every 2nd Tuesday of the month

FullStory

1745 Peachtree St. NW Ste G · Atlanta, GA

How to find us

FullStory is behind the Chipotle, facing the parking lot

Location image of event venue

Details

"Dapper" is a seminal paper on distributed systems tracing, describing the design, development, and results of the eponymous system at Google. Published in 2010, this paper spurred development of similar systems, such as Zipkin (at Twitter) and OpenTracing.

Presenting the paper is Dan Kuebrich, Director of Platform Engineering at FullStory. Most relevant to this paper, Dan was the co-founder of Tracelytics, an early SaaS tracing and performance platform. In 2012 Tracelytics was rebranded TraceView following acquisition by AppNeta, where Dan was appointed CTO in 2015.

"Dapper, a Large-Scale Distributed Systems Tracing Infrastructure"
Original paper: https://ai.google/research/pubs/pub36356

Abstract:
Modern Internet services are often implemented as complex, large-scale distributed systems. These applications are constructed from collections of software modules that may be developed by different teams, perhaps in different programming languages, and could span many thousands of machines across multiple physical facili- ties. Tools that aid in understanding system behavior and reasoning about performance issues are invaluable in such an environment.

Here we introduce the design of Dapper, Google’s production distributed systems tracing infrastructure, and describe how our design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met. Dapper shares conceptual similarities with other tracing systems, particularly Magpie [3] and X-Trace [12], but certain design choices were made that have been key to its success in our environment, such as the use of sampling and restricting the instrumentation to a rather small number of common libraries.

The main goal of this paper is to report on our experience building, deploying and using the system for over two years, since Dapper’s foremost measure of success has been its usefulness to developer and operations teams. Dapper began as a self-contained tracing tool but evolved into a monitoring platform which has enabled the creation of many different tools, some of which were not anticipated by its designers. We describe a few of the analysis tools that have been built using Dapper, share statistics about its usage within Google, present some example use cases, and discuss lessons learned so far.