Skip to content

May 2021 Seattle DevOps Meetup - Yes, we are back!

Photo of Dave Nash
Hosted By
Dave N. and Jason G.
May 2021 Seattle DevOps Meetup - Yes, we are back!

Details

Hello Seattle DevOps Meetup friends! 🙌🏽

Meetups will continue to be held during one Wednesday each month via Zoom. Each meetup will feature 2 talks about 20 minutes long with 5 minutes reserved for Q+A. No product demo or pitches please—we want to focus on learning, not selling.

Agenda:

4:00pm - Welcome + Networking
4:15pm - 4:45pm - Jeffrey Smith
4:45pm - 5:15pm - Jason Yee
5:15pm - Wrap up

Jeffrey Smith -

Jeff Smith has been in the technology industry for over 20 years, oscillating between management and individual contributor. Jeff currently serves as the Director of Production Operations for Centro, an advertising software company headquartered in Chicago, Illinois. Before that he served as the Manager of Site Reliability Engineering at Grubhub.

Jeff is passionate about DevOps transformations in organizations large and small, with a particular interest in the psychological aspects of problems in companies. He lives in Chicago with his wife Stephanie and their two kids Ella and Xander.

Jeff is also the author of Operations Anti-Patterns, DevOps Solutions with Manning publishing. (https://www.manning.com/books/operations-anti-patterns-devops-solutions)

Talk Title: Troubleshooting Tiered Tragedy: A Peek Into Failure

Talk Abstract: Failure is complicated. Sometimes an incident can reveal latent failures in your systems that have just been sitting dormant, waiting for the right combination of factors to activate them. In this talk Jeff Smith will walk through a real failure scenario and the process Centro uses to highlight issues that go beyond just the life cycle of an outage. We’ll walk through the importance of looking into signals before they become catastrophic and ensuring your team has the capacity to do so. We’ll examine how monitoring the same system from multiple vantage points can help avoid confusion and gain clarity during an incident. How the Product organization plays a vital role in protecting system uptime, and lastly how a collaborative culture can decrease your Mean Time to Recovery.

Jason Yee -

Jason Yee is Director of Advocacy at Gremlin where he helps companies build more resilient systems by learning from how they fail. He also leads the internal Chaos Engineering practices to make Gremlin more reliable. Previously, he worked at Datadog, O’Reilly Media, and MongoDB. Outside of work, he enjoys drinking whiskey, playing Pokemon Go, and making craft chocolate.

Talk Title: Validating your incident retrospective

Talk Abstract: How many times have you responded to an incident and thought, “This seems familiar.” But after the last incident you ran a retrospective, generated action items, and implemented those changes. So what went wrong? Complex systems.
In a complex system, failures are always a combination of factors. Solving for one or more of those factors can often expose other risks that can contribute to other (sometimes similar) failures. In this talk, I’ll share how to use Chaos Engineering to validate your incident response/retrospective and uncover any latent issues they may cause.

** Times are approximate

Photo of Seattle DevOps Meetup group
Seattle DevOps Meetup
See more events