Observability: Automating Your Understanding of Context


Details
Making good technical decisions requires understanding your context and making appropriate choices for your circumstances. Andy Domeier shares with us how SPS Commerce applies automation to solve this.
Automate Your Context
Andy Domeier (Director - Technology Operations at SPS Commerce)
Complex systems are complex. As your companies grow or your architectures evolve you need to be thinking about how you keep up with the changes and make sense of it all. How you react to the right thing quickly is becoming a greater challenge every day. Too often engineers are responding to the fire alarm without a clear indication of where the fire is actually located.
At SPS we have a small Reliability Engineering team and in an effort to make the most positive impact we can we've focused our efforts on Observability but more importantly Automating Context. When a service is having performance issues there are so many relevant data points we need to get into the hands of our engineers as soon as possible. Was there a recent change? Are all of that service's dependancies healthy? Did any dependancies of this service recently change? Have we responded to this situation before? Are other upstream services impacted by this? How is this impacting our customers? What is impacted if a critical 3rd party object store is unavailable? The list can go on.
We'd like to share our journey towards trying to better automate context with an event driven architecture. This has the power to help everyone working to deliver a service be more effective and more informed. Posting service dependancies to Change and Incident records, automatically appending documentation to alerts, and automating monitoring setups are some solutions we've worked on to date with positive results. We hope our story can help others and create more conversation in the industry about investing in automating your context.
About the speaker
Andy has been in Technology Operations leadership with SPS Commerce for the past 13 years. SPS grows very aggressively creating an environment of crazy fun persistent growth challenges. Andy spends many mental cycles collaborating to solve effective patterns for monitoring and operating complex changing systems. Andy leads the System Operations, Reliability Engineering, and Continuous Improvement teams at SPS.
Dinner & drinks will be served. SPS Commerce is our venue host; food and drink sponsorship provided by Datadog (https://www.datadoghq.com/).
Schedule:
6pm: Doors open
6:30pm: Welcome, sponsor, and talk
9pm: Close

Observability: Automating Your Understanding of Context