while (true) do; how hard can it be to keep running?
Caskey Dickson of Google
At Google we have more than a handful of servers and must leverage our administration time as effectively as possible. Between custom in-house software and off-the-shelf daemons, there are many parts to running a reliable, distributed, redundant service. Most fundamental is running the software and keeping it running. Through reboots, crashes, upgrades, downgrades, bugs, canaries and outages, myriad forces conspire to end your process and keep it stopped or worse, keep it alive but not functioning.
There exists init, upstart, rc scripts, cron, at and more that provide mechanisms to run programs unattended, but each of them can fail in different ways. When you have dozens or hundreds of servers they will fail in many different ways. This talk will discuss the obvious and not-so-obvious failure modes of popular packages like upstart and cron, as well as how we’ve worked with and around them to ensure that when we run a daemon it stays running. Some special emphasis will be given to how virtual hosts create new challenges that can trip up launch strategies and services written for bare metal.
About the speaker
Caskey Dickson is a Site Reliability Engineer/Software Engineer at Google where he works on infrastructure systems writing and maintaining monitoring services that operate at google scale. Working in online service development and system administration since 1995, before coming to Google he was a senior developer at Symantec, wrote software for various internet startups such as CitySearch, Cars Direct, WeddingChannel, ran a consulting company for several years and even spent a half decade teaching undergraduate and graduate computer science at Loyola Marymount University. He has an undergraduate degree in Computer Science, a Masters in Systems Engineering and an MBA from Loyola Marymount.
Parking and Transportation; Getting In; Etc.
There is no on-site guest parking but free street parking is available on surrounding blocks. Also, the Google office is convenient to several bus stops. Use the main entrance on Main St. It's the giant binoculars, hard to miss.
You will have to get a printed name tag and then you'll be escorted to the room.
Please be on time.
You can come as early as 6:30 PM. We'll have pizza and drinks.
After the meeting, Caskey will join us for drinks down the street at O'Briens Pub.