Monitoring and distributed tracing at Slack and Pinterest

This is a past event

36 people went

Location visible to members

Details

We have two awesome speakers for our next meetup on May 30th!

The first talk is by Naoman Abbas from Pinterest on distributed tracing

Abstract
Like most modern large-scale applications, Pinterest is built on a microservices architecture. In this scheme, a number of services work together to server a single user request. Debugging performance and architectural problems in this environment can be challenging. Distributed tracing has emerged as the indispensable tool and solution to address these challenges.

At Pinterest, we deployed Pintrace, a Zipkin-based distributed tracing system. Pintrace records end-to-end performance data across the execution path of requests, from mobile applications to backend services. Pintrace has evolved over time as its users find new data and as new subsystems integrate with our tracing systems. We’ve built tools for visualization, feature extraction, aggregation and analysis of trace data. These tools help enable use cases that wouldn’t have been possible with traditional tooling, such as root-cause analysis, latency analysis and regression analysis.

In this talk, I will share the tools we’ve built to process trace data, the uses cases they’ve enabled and some real world examples. Through it, I hope to help you understand how you can apply these techniques to your own challenges.

Bio
Naoman Abbas is a software engineer on the visibility team at Pinterest, where he leads Pintrace, their distributed tracing system. Prior to Pinterest, Naoman worked at Netflix and Microsoft as a software engineer building cloud platform components.

Our next talk is about monitoring at Slack by George Luong

Over the last four years, Slack has implemented various monitoring solutions as our requirements evolved. First we had Ganglia. Then we had the great migration to Graphite. Now it’s time for Prometheus.

Perhaps the most daunting task: converting over our webapp-generated metrics, which involved sharding and federating metrics from servers around the world, all whilst upholding the principles of reliability, scalability, and operability.

Join us as we recount the challenges we encountered, the findings we discovered, and the successes we achieved.

George is an operations engineer on Slack’s Visibility Infrastructure team, where he helps build and maintain Slack’s metrics and logging infrastructures. He has a degree in human biology from UC San Diego, but stumbled into tech shortly after. On a recent trip to Australia, he honed his steel slinging and earned a lifetime membership at a darts hall. Unfortunately, this is his proudest accomplishment.

Look forward to seeing folks there!