Published in 2005 in the journal Cognition, Technology and Work, "Problem Detection" explores the "process by which people first become concerned that events may be taking an unexpected and undesirable direction that potentially requires action." While this paper primarily centers on empirically rebutting previous theories of how problems are detected, it also puts forth many important observations and concepts for software engineering to pay close attention to. This talk won't just be a re-statement of the paper's core views; I will place these into a software engineering and operations context and connect them to SRE and DevOps worlds in ways that may be consequential.
The paper's authors are Gary Klein, Rebecca Pliske, Beth Crandall, and David Woods.
John Allspaw has worked in software systems engineering and operations for over twenty years in many different environments: biotech, government, online media, social networking, and e-commerce. John’s publications include the books The Art of Capacity Planning (2009) and Web Operations (2010) as well as the forward to “The DevOps Handbook”. His 2009 Velocity talk with Paul Hammond, “10+ Deploys Per Day: Dev and Ops Cooperation” helped start the DevOps movement.
John served as CTO at Etsy, holds an MSc in Human Factors and Systems Safety from Lund University, and is currently a Principal Researcher at Adaptive Capacity Labs.