Matt Adereth on "A Scalable Bootstrap for Massive Data"

Details
Mini
Marios Assiotis (http://marios.io) on "Throttling Utilities in the IBM DB2 Universal Database Server" ( http://www.nt.ntnu.no/users/skoge/prost/proceedings/acc04/Papers/0354_ThA01.3.pdf )
Marios' Bio
Marios is the CTO at TubiTV, the world's largest free streaming TV & movie library. His interests include simplifying complex systems, storage and low latency network i/o at scale. A transplant from Cyprus, he spends his free time trying to create the perfect all-American burger.
Main Talk
Matt Adereth (https://twitter.com/adereth) from Two Sigma will present "A Scalable Bootstrap for Massive Data" ( http://arxiv.org/pdf/1112.5016.pdf ). Bootstrapping is a powerful statistical technique for assessing the quality of estimators. It's computationally intensive and it's not immediately obvious how it could be applied efficiently in a distributed environment.
We'll go through a history of computational methods for assessing estimator quality from the Jackknife Method (1949) to today, explaining why you want to do this and assuming only basic statistical knowledge. While the paper gets pretty hot and heavy with the math, we'll keep it light.
I love this paper because it's a look at how the world of statistics has to adapt to the new reality of distributed compute.
Matt's Bio
Matt builds tools and infrastructure for quantitative research at Two Sigma. He previously worked at Microsoft on Visio, focusing on ways to connect data to shapes In his spare time, he builds ergonomic keyboards using Clojure.
Meeting mechanics
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be food.
After the paper is presented, we will open up the floor for discussion and questions then we will head over to the bar!
Remember
PWL SF strictly adheres to the Code of Conduct (https://github.com/papers-we-love/papers-we-love/blob/master/CODE_OF_CONDUCT.md) set forth by all PWL charters.

Matt Adereth on "A Scalable Bootstrap for Massive Data"