Skip to content

Enter the Experimentation Data Science Rabbit Hole

Photo of Steve Urban
Hosted By
Steve U. and 2 others
Enter the Experimentation Data Science Rabbit Hole

Details

Come join us for an evening of talks given by data scientists from Netflix, Intuit, Twitter, and Airbnb as they describe test design and analysis techniques they utilize.

This time we’re being graciously hosted by the University of San Francisco (USF), where you should expect the usual setup of Experimentation Platform demos along with light food and non-alcoholic drinks while you mingle with fellow experimenters.

Schedule:

6 to 7: Sign ins, experimentation platform demos, and socialization

7 to 8:30: Talks

Nathaniel Stevens (https://www.linkedin.com/in/nathanieltylerstevens/): USF

Bio: Nathaniel is an Assistant Professor of Statistics at the University of San Francisco who teaches jointly in the BS in Data Science and MS in Analytics programs at USF. His interests include experimental design, time series analysis, machine learning, exploratory data analysis, and data visualization.

Abstract: In this talk I will advertise “A/B Testing and Beyond: Designed Experiments for Data Scientists”, a continuing education certificate offered by USF’s Data Institute. This course will expose participants to the value of experimentation in the field of data science and provide a thorough treatment of available methods and best practices in the design and analysis of experiments.

Lucile Lu (https://www.linkedin.com/in/luo-lucile-lu-85b65848/): Twitter

Bio: Lucile Lu is the tech lead of metrics & experimentation team, after working on experimentation for 3.5 years at Twitter. She owns the statistics part and the metric part of Duck Duck Goose - the AB test framework at Twitter, and the experiment consulting/education for experimenters as well. Her interests also include sequence analysis and metric meta analysis.

Abstract: It is not uncommon that an experiment needs to be restarted when there a bug is found, a larger traffic size is needed or a new metric is added. However, there is no single right answer as to how we bucket the users in the new version of the experiment. Shall we reuse the users in the previous version? Or shall we flush the users and resample from the rest of the users? The problem is more challenging than it appears. I’ll go over various options based on how the metrics are calculated and how users behavior changed from the old version of the experiment.

Joshua Parks (https://www.linkedin.com/in/joshuaparks/): Netflix

Bio: Joshua is a data scientist at Netflix, where he drives algorithm experimentation for personalized recommendations and search. Prior to Netflix, he ran experiments as both a product manager and a data scientist focusing on user growth and monetization. Most recently he was at Uber, where he led data efforts for strategic partnerships.

Abstract: Product experimentation typically involves comparing metrics between two distinct groups of users, one that receives a treatment and another that serves as control. In this talk, we will discuss an alternative testing method of paired comparisons, in which each user receives both the treatment and control experience. We'll describe potential advantages of this method, particularly around variance reduction in metrics, and present an example of its application at Netflix: testing ranking algorithms.

Colin Dillard (https://www.linkedin.com/in/colindillard/): Intuit

Bio: Colin Dillard is a data scientist at Intuit. He wrote the original statistics implementation for Wasabi, Intuit's open-source AB testing service. He has also worked on a variety of other projects at Intuit including personalization, fraud detection, and customer matching.

Abstract: AB testing is an inherently uncertain process. In fact, to even run a test you have to specify the chance of error you are willing to accept. Unfortunately, many common testing approaches, such as evaluating multiple metrics or stopping a test before it has completed, cause an increased chance of error. Unless steps are taken to prevent these behaviors or correct for their effects, your testing will be less accurate than you expect.

In this talk I will give an overview of this common problem and demonstrate its impact. I will give practical advice on how to reduce the impact. And, as time permits, I will highlight some mathematical corrections that can be applied to systematically resolve these issues.

Cuky Perez (https://www.linkedin.com/in/cukyperez/): Airbnb

Bio: Cuky Perez is a data scientist manager at Airbnb where she has gained extensive experience A/B testing new tools for our hosts. Prior to Airbnb, she was an assistant professor teaching students the art of experimental designs and data analysis at the University of Washington.

Abstract: In this presentation I will discuss how at Airbnb we have solved some of the pitfalls and challenges of A/B testing. For instance: what can we do when simple random assignment does not guarantee balanced characteristics across treatment and control conditions? Can we reduce the time it takes for an A/B test to reach enough statistical power? How do we assure that teams make decisions based on rigorous evidence and protect against p-hacking?

Additional Transportation Information:

For transportation, note that this location is one block from the Transbay Terminal, so any MUNI routes that go there are a good option. We're also just a 5 minute walk from the Embarcadero BART station on Market Street. There is no parking at 101 Howard designated for USF use, however there is a public parking garage about a half block away at 121 Spear Street at the Rincon Center.

Photo of The Journey from A/B Testing to Personalization group
The Journey from A/B Testing to Personalization
See more events