We’re hosting the first Two Sigma Open Source meetup on Monday, 11/13 at 6pm! These quarterly meetups will focus on the open source projects that Two Sigma cares most about, from projects we generated in-house then open sourced to large external open source projects that we depend on to do our work. The inaugural TSOS meetup will feature talks from three TS-ers, each introducing a TS-originated open source project. We’re expecting about an hour of talks total, to be followed by chatting and pizza!
Doors open at 5:30 pm; the presentations will begin at 6 pm.
You'll have to check-in with security with your Name/ID. Definitely sign-up (with your real first and last name) if you’re going to attend–unfortunately people whose names aren’t entered into the security system in advance won’t be allowed in.
Cook: a framework for fairly scheduling batch workloads on Mesos (Wil Yegelwel)
At Two Sigma, we have thousands of machines and hundreds of people who want to access compute, often more than we can provide at any one time. Being able to dynamically share compute among those users as demand changes is essential to ensuring everyone is able to get their work done quickly and we use our resources efficiently. We built Cook to address these concerns.
From a user’s perspective, Cook provides a queue they can submit shell commands to that will be run on a machine in the cluster as soon as possible. When there are not many other users using the cluster, each may get a large share of resources but when a lot of people want access to compute, they each will receive smaller shares. Users have learnt they can submit their workloads and be confident it will complete in a reasonable time.
While scheduling, Cook needs to decide when to schedule a particular task and where to place it. Answering when and where to schedule a task is fundamental to fairly sharing the cluster amongst our many users. However, this is not enough. When the cluster is fully utilized and a new user submits tasks, we want those tasks to be scheduled quickly. Sometimes that can mean preempting tasks of users who are already running to allow for all users to have a fair share of the cluster. These three concerns mirror the three main components in Cook.
In this talk, Wil will discuss why Two Sigma built Cook, the high level architecture of Cook, how we achieve fairness across our many users and some challenges we are facing.
BeakerX: A collection of kernels and extensions to the Jupyter interactive computing environment (Scott Draves)
BeakerX is a set of Jupyter Notebook extensions that enable polyglot data science, time series plotting and processing, interactive tables, and research publication. The Beaker project began five years ago as a standalone notebook, and in 2016 the project made the decision to redesign the software to integrate tightly with the Jupyter platform. Draves will explore both the evolution of our thinking that led to this pivot and the evolution of the software itself, review the Jupyter extension architecture, speak to how BeakerX plugs into that architecture, and discuss the current set of BeakerX capabilities. The team couldn’t undertake a project this big on our own; Draves will also cover the partnerships that make this work possible and present the roadmap the team created with these partners.
Waiter: a platform for automatic scaling, load balancing, and versioning of service-oriented architectures (Shams Imam)
One of the key challenges in developing a service-oriented architecture (SOA) is anticipating traffic patterns and scaling the number of running instances of services to meet demand. In many situations, it’s hard to know how much traffic a service will receive and when that traffic will come. A service may see no requests for several days in a row and then suddenly see thousands of requests per second.
If developers underestimate peak traffic, their service can quickly become overwhelmed and unresponsive, and may even crash, resulting in constant human intervention and poor developer productivity. On the other hand, if they provision sufficient capacity upfront, the resources they allocate will be completely wasted when there’s no traffic.
In order to allow for better resource utilization, many cluster management platforms provide auto-scaling features. These features tend to auto-scale at the machine/resource level (as opposed to the request level) or by deferring to logic in the application layer. A more optimal approach would be to run services when--and only when--there is traffic.
Waiter is a distributed auto-scaler that delivers this optimal type of request-level auto-scaling. It requires no input or handling from applications and is agnostic to underlying cluster managers; it currently uses Mesos, but can easily run on top of Kubernetes or other solutions.
Another challenge with SOAs is enabling the evolution of service implementations without breaking downstream customers. On this front, Waiter supports service-versioning for downstream consumers by running multiple, individually-addressable versions of services. It automatically manages service lifecycles and reaps older versions after a period of inactivity.
With a variety of unique features, Waiter is a compelling platform for applications across a broad range of industries. Existing web services can run on Waiter without modification as long as they communicate over HTTP and support the transmission of client requests to arbitrary backends. Two Sigma has employed the platform in a variety of critical production contexts for over two years, with use cases rising to hundreds of millions of requests per day.