LOPSA-LA, UUASC, and LinuxLA present:
A Working Theory of Monitoring
by Caskey L. Dickson, Site Reliability Engineer, Google Inc.
Note: RSVP at the LOPSA event page before Oct. 11 so that visitor badges can be printed.
At Google we have discovered many common pitfalls and false simplifications that cause frustration and blind-spots with monitoring systems. Internally we have our own home-grown monitoring systems, but to move beyond the hit-and-miss approach to monitoring we have developed a formal model for such systems. This model is used as a framework for developing, evaluating, and evolving monitoring systems at Google that are suitable for operating at scale.
We will present our model, show how existing open source solutions fit (and don't fit!) into that model, and invite attendees to contrast it with their experiences. The goal is to encourage a larger discussion into the theory of monitoring and how current solutions can be evolved into more effective tools for operators of large systems.
Caskey Dickson is a Site Reliability Engineer/Software Engineer at Google, where he works writing and maintaining monitoring services that operate at "Google scale." In online service development since 1995, before coming to Google he was a senior developer at Symantec, wrote software for various internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has an undergraduate degree in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount.
PARKING AND DIRECTIONS
* No on-site parking, please use street parking and public lots surrounding the Google facility.
* The entrance is on the South-East corner of the building through a vehicle gate (corner of Sunset and Hampton), do not come to the main entrance, they will just send you around back.
* Google security will be at the gate entrance, state you are here for the 'LOPSA meetup' and he will admit you.