Past Meetup

Nathan Taylor on OS scalability & Chris Meiklejohn on Chain Replication

This Meetup is past

129 people went

Location image of event venue

Details

****** We are closing the year with a PWL marathon ******

Talk #1 - Nathan Taylor (http://twitter.com/dijkstracula) on " Corey: An Operating system for Many Cores" ( https://www.usenix.org/legacy/event/osdi08/tech/full_papers/boyd-wickizer/boyd_wickizer.pdf ) and "An analysis of Linux scalability to many cores" ( https://pdos.csail.mit.edu/papers/linux:osdi10.pdf )

From Nathan:

This is a story that spans two low-level systems papers. While on the surface it's all about how to make operating systems scale, it's also a story about how the same researchers can tackle a problem from different angles, succeed each time, and yet end up with very different conclusions.

At OSDI '08 Silas Boyd-Swicker et al published Corey: An Operating system for Many Cores (https://www.usenix.org/legacy/event/osdi08/tech/full_papers/boyd-wickizer/boyd_wickizer.pdf), which advocated for a fundamental restructuring of the operating system; they observed that scalability problems often manifest because of data unintentionally shared between CPUs, so their research OS provided improved abstractions for programmers to inform the kernel about what is supposed to be local to a particular core or thread.

At the subsequent OSDI, though, the same research lab published an analysis of Linux scalability to many cores (https://pdos.csail.mit.edu/papers/linux:osdi10.pdf), where they present the same problem -- kernel scalability-- but instead tackled it by finding individual bottlenecks and implementing tiny point fixes, mostly-hidden from application code. This inverted approach worked so well that in their abstract they arguably recanted the conclusion from their earlier work: "...there is no scalability reason to give up on traditional OS organizations yet."

These papers, together, are interesting to me for a bunch of reasons. The former advocates for better application control of data sharing by exposing new abstraction primitives, whereas the OS remains in the latter paper an opaque layer where we're to be mostly content at not having to look underneath the hood. Seeing two seemingly opposing philosophies work equally well was a surprising result to me. It's also interesting how the tacit assumptions we make can cause wildly different systems to be built. In one, it's treated as axiomatic that a new software architecture is needed, whereas in the other the very first sentence questions the "traditional architectures don't scale" assumption. And, of course, when so many people build careers out of asserting that There's Only One Way To Solve This Problem, it's refreshing to see the same group of researchers try different approaches in such quick succession.

Nathan's Bio

Nathan is currently working on low-latency content distribution at Fastly (http://www.fastly.com/) and has previously hacked on improving the performance of language runtimes and OS hypervisors. His first exposure to OS research came as a graduate student at the University of British Columbia.

Talk#2 : Christopher Meiklejohn's A Brief History of Chain Replication

Chain replication promises a high throughput, linearizable, robust replication technique with minimal overhead to tolerate failures with only f+1 nodes. But, what's the reason for so many systems choosing alternative techniques such as quorum-based or state machine replication? In this talk, we talk through a history of chain replication, starting with the original work from 2004 by van Renesse and Schneider. We will look at the various systems built using chain replication: Hibari, FAWN-KV, and CRAQ. We'll explore safer designs of chain replication, such as the elastic replication work in 2013, and finally look a new and unique designs of chain replication, such as Basho's Machi system.

The papers discussed in Chris's talk are:

• Object Storage on CRAQ - https://www.usenix.org/legacy/event/usenix09/tech/full_papers/terrace/terrace.pdf

• FAWN: A Fast Array of Wimpy Nodes - http://www.sigops.org/sosp/sosp09/papers/andersen-sosp09.pdf

• Chain Replication in Theory and in Practice - http://www.snookles.com/scott/publications/erlang2010-slf.pdf

• HyperDex: A Distributed, Searchable Key-Value Store - http://hyperdex.org/papers/hyperdex.pdf

• ChainReaction: a Causal+ ConsistentDatastore based on Chain Replication - http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Almeida.pdf

• Leveraging Sharding in the Design ofScalable Replication Protocols - http://www.ymsir.com/papers/sharding-socc.pdf

Chris's Bio

Christopher Meiklejohn (https://twitter.com/cmeik) is a Senior Software Engineer with Machine Zone, Inc. working on distributed systems. Previously, Christopher worked at Basho Technologies, Inc. on the distributed key-value store, Riak. In his spare time, Christopher develops a programming language for distributed computation, called Lasp. Christopher is starting his Ph.D. studies at the Université catholique de Louvain in Belgium in 2016. http://christophermeiklejohn.com/

Meeting mechanics

Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be food.

After the paper is presented, we will open up the floor for discussion and questions then we will head over to the bar!

PWL SF strictly adheres to the Code of Conduct (https://github.com/papers-we-love/papers-we-love/blob/master/CODE_OF_CONDUCT.md) set forth by all PWL charters.