Skip to content

Details

6:30PM - Food and drink compliments of AdRoll
7:00 - Announcements
7:05 - Mark Hayden: Using R In Production at AdRoll
7:35 - John Mount: vtreat: Automating data preparation in R.
8:10 - John-Mark Agosta: How Bayesian Probability found its way to Machine Learning via Artificial Intelligence: Forest Gump meets The Reverend Bayes

#----------------------------------------------------

Mark Hayden
Using R in production at AdRoll

At AdRoll, we process over 30TB of compressed data daily which constitutes over 100 Billion events from around the world. We use R for a wide variety of ad-hoc analytics tasks and even in the data pipelines that power our reporting and dashboards. Mark will walk through our deployment of R on Amazon Web Services (AWS), how it works with other big data tools we use such as Presto, and what we have learned along the way.

Mark is AdRoll's business intelligence manager, responsible for reporting, business analytics, and product optimization."

#---------------------------------------------------

John Mount
vtreat: Automating data preparation in R.

Data preparation and cleaning is one of the biggest determiners of data science project success or failure. Many of the steps require domain knowledge and are done by hand. However, there are also a number of steps that can be, and therefore should be, automated. We will outline some of the common data problems and give a quick user’s guide to “vtreat” a R package available on CRAN for variable treatment.

John Mount is a principal consultant at Win-Vector LLC, a San Francisco based data science consultancy. He is a frequent writer and speaker on mathematical, statistical, machine learning, and data science topics. He is one of the authors of “Practical Data Science with R” (Manning 2014) a popular introduction to predictive modeling techniques using the R analysis platform.

#-----------------------------------------------------

How Bayesian Probability found its way to Machine Learning via Artificial Intelligence:

Forest Gump meets The Reverend Bayes

John Mark Agosta, Principal Data Scientist, Microsoft

Part retrospective and part prospective, this talk harkens back to the early days on reasoning with uncertainty in Artificial Intelligence (AI), and how this has evolved into the Bayesian branch of Machine Learning. Going back to developments in the 1980s with Pearl’s belief networks and their use in expert systems, I will show how these have evolved into present-day Probabilistic Graphical Models (PGMs), using some of the current R packages for PGMs.

Starting with a brief tutorial on PGM notation, I’ll give my personal view on how this branch of AI drew upon fundamentals from decision theory for methods to reason under uncertainty. Then I’ll cover how the advent of learning PGM structure from data came about by adopting Bayesian methods from Statistics in the 1990’s.

Current Bayesian machine learning methods are best understood by a 2001 paper (http://research.microsoft.com/apps/pubs/default.aspx?id=65088https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CB4QFjAAahUKEwi0gM658o3IAhUJXIgKHXEoB_w&url=http%3A%2F%2Fai.stanford.edu%2F~ang%2Fpapers%2Fnips01-discriminativegenerative.pdf&usg=AFQjCNFUabPGxyOpd4L-Cm8_OmBDXhw2lw&sig2=1Wzs-8PFwvfcdTOJpwSEzQ) by Andrew Ng and Michael Jordan on the distinction between generative and discriminative models. There’s a progression from basic Naïve Bayes models to more sophisticated models, concluding with a new method called “Learning with Counts” that builds on a combination of naïve Bayes with current ensemble learners.

Sponsors

Sponsor logo
RStudio
Financial support for meetings
Sponsor logo
R Consortium
Meetup.com membership

Members are also interested in