Skip to content

Creating a successful SRE program like Netflix and Google

Photo of Eugene Dvorkin
Hosted By
Eugene D.
Creating a successful SRE program like Netflix and Google

Details

This talk for engineers, DevOps, Architects who want to learn how to run production systems, what to monitor, what to alert on, how to structure the team, deployment process, etc.

There are many talks how to work with data using Kafka, Flink, Spark and other streaming technologies. But I did not come across many talks dedicated to operating this services and many other stateless or stateful services reliably. In this talk we will learn from Jonah Horowith (@Stripe) and Blake Busset (Google) what is SRE and how to build it.

The event starts at 7.00 sharp, door will be open at 6:45 pm. Please do not come earlier than 6:45 pm because building security will not let in before 6:45 pm.

Abstract:

What isn’t site reliability engineering? Lots of companies claim to have SRE teams, but some don’t quite understand the full value proposition—or what shiny technologies and organizational structures will negatively impact your operations rather than empowering your team to accomplish your mission.

Jonah Horowitz from Stripe and Blake Bisset (formely at Google) share stories about anti-patterns in monitoring, incident response, configuration management, and more that they’ve tripped over on their own teams, seen proposed as good practice in talks at other conferences, or heard in talks with peers in the industry. Jonah also explains how Google and Netflix view the role of the SRE and how it differs from the traditional system administrator role. You’ll learn that freedom and responsibility are key, trust is required, and chaos is (sometimes) your friend.

Speakers:

Jonah Horowitz (https://www.linkedin.com/in/jonahhorowitz/) (Stripe)

https://secure.meetupstatic.com/photos/event/9/1/2/e/600_464677166.jpeg

Jonah Horowitz is a site reliability engineer at Stripe, where he works with all of the company’s individual engineering teams to drive reliability efforts, including monitoring, alerting, deployment pipelines, and chaos resiliency. Previously, Jonah worked at several startups around the Bay Area, including Netflix, Quantcast (a leading ad-tech startup, where he grew their network to process over three million events per second), and Looksmart (a contextual advertising company), and was on the founding team of Wal-Mart.com (now @Walmart Labs), where he built out the company’s software deployment pipelines and its product image management systems.

Blake Bisset (https://www.linkedin.com/in/bisset/) (Google)

Blake Bisset got his first legal tech job at 16. He won’t say how long ago, except that he’s legitimately entitled to make shakey fists while shouting, “Get off my LAN!” He’s cofounded three startups—a joint venture with Dupont/ConAgra, a biotech spinoff from UW, and one that started this time a bunch of kids were sitting around on New Year’s Eve, wondering why they couldn’t watch movies on the internet—only to end up spending a half-decade as an SRM at YouTube and Chrome, where his happiest accomplishment was holding the go/bestpostmortem link for several years.

Agenda:

6:45 -7:10 - networking

7:15-8:00 - Creating a successful SRE program like Netflix and Google and Q&A

8:00- 8:15 - Wallaroo Announcement

Wallaroo is an ultrafast and scalable data processing engine that rapidly takes you from prototype to production by eliminating infrastructure complexity. A variety of applications can be built with Wallaroo, from microsecond response to long-running analysis, including monitoring, analytics, model training, predictive analytics, and microservices. Our goal with Wallaroo is to make it really simple to deploy and scale, with the broadest developer support, and the best performance!

Wallaroo Core will be available open source (under Apache 2) on 9/29/2017.

In this quick talk, you will get an overview of Wallaroo and learn how you participate in this new community.

About Stripe:

Stripe is a US technology company operating in over 25 countries, that allows both private individuals and businesses to accept payments over the Internet.

About our host - Lifion (http://lifion.com/about/):

https://secure.meetupstatic.com/photos/event/d/4/3/a/600_464694330.jpeg

Lifion (ADP company) is transforming a world of HR pain into useful tools and meaningful experiences for millions of people worldwide.

They are bringing together some of the brightest developers, architects and designers in the industry to create next generation HR platform.

Photo of New York City Real-Time Stream Processing User Group group
New York City Real-Time Stream Processing User Group
See more events
Lifion by ADP
135 West 18th Street · New York, NY