Past Meetup

Statistically Validate Models with R & Transaction Data Mining with Neural Nets

This Meetup is past

134 people went

Location image of event venue


Open Data Science is excited for this co-hosted Meetup with Earnest!

What You'll Learn:

Together, we’ll learn about methods to reliably evaluate machine learning models using R and R graphics with John Mount and Nina Zumel from Win-Vector. Additionally, we'll learn how to extract relevant information from unstructured transaction data with Frank Taylor from Earnest.

Talk 1: Statistically Validating Models Using R

John Mount and Nina Zumel will demonstrate methods to reliably evaluate machine learning models using R and R graphics (including ggplot2). In particular, they will show how to use plot, and explain the following common model performance metrics: in-sample quality measures, out-of-sample quality measures, cross-validation measures, and permutation tests. This is an opportunity to re-learn your fundamental statistics with concrete and vibrant examples.

Meet Your Speakers for Talk 1:

John Mount and Nina Zumel are co-owners of Win-Vector, a company that conducts data analysis and statistical research for a variety of private sector clients, particularly in the biotech, finance, and internet sectors. Over the last eight years, Win-Victor has worked on projects such as revenue attribution for Google ad-words, customer modeling from online transaction data, product recommendation systems, and loan risk modeling.

John and Nina are the authors ofPractical Data Science with R, an example-driven training book that teaches readers how to apply the R programming language and statistical analysis techniques in marketing, business intelligence, and decision support.

Nina Zumel holds a B.S. in Electrical Engineering and Computer Science from UC Berkeley, and a Ph.D. in Robotics from Carnegie Mellon University. Prior to her work at Win-Vector, Nina was a software engineer at Optivo and a co-founder at Quimba Software. While at Optivo, she developed the core numerics engine for Optivo's Price Manager, a system that monitors market demand for products on online channels––in real and semi-real time––and adjusts prices accordingly to optimize their clients' business objectives. Subsequently, at Quimba Software, Nina and her team provided research services and specialty software development to clients in government and private Industries with the following application areas: intelligence, homeland security, and emergency response/emergency management.

John Mount holds a B.S. in Mathematics from UC Berkeley, and a Ph.D. in Computer Science from Carnegie Mellon University. Prior to Win-Vector, John was the Senior Software Engineer at TheMoment, a Trader/VP at Bank of America, and the Director of Research at While working at TheMoment, he developed advanced Internet exchange applications sold into enterprise and ASP environments, one of which was praised by the CEO of TicketMaster/CitySearch in a CNBC interview. During his time at Bank of America, he co-managed a development group for a diverse program trading desk, and developed a significant statistical machine learning platform that allowed his team to develop and deploy profitable statistical arbitrage trading strategies.

Talk 2: Mining Noisy Transaction Data with Neural Nets

Frank Taylor will demonstrate how to extract and utilize relevant information from unstructured data through the utilization of neural nets. Although extracting information from unstructured data presents a challenge, it is valuable for those making high-risk business decisions; for instance, Earnest uses unstructured transaction data for underwriting loans or for monitoring credit worthiness. The goal of this discussion is to describe the performance of the model Earnest uses to mine transaction data with neural nets, in addition to training a neural net in a large-data distributed framework like Spark.

Meet Your Speaker for Talk 2:

Frank Taylor is a data scientist at Earnest, a finance company that combines rich data analytics and elegant software to enhance people’s lives by lowering the high costs and barriers to credit faced by millions of financially responsible people. Frank utilizes the latest and most advanced tools to scale predictive APIs and recommendation systems, and he uses modeling contextualization to enhance the power of algorithms. Currently, he is working to understand Earnest's unstructured customer transaction level data; this data comes from sources such as checking accounts, investment accounts, credit card accounts, etc. Earnest has millions of rows at the moment, but they lack an understanding of the data contained in the aforementioned rows. In a nutshell, Frank is working to make this unstructured data usable for Earnest in their decision-making processes.

Prior to working at Earnest, Frank was part of a team of data scientists at Personograph, an innovative mobile monetization and audience intelligence platform that creates robust user profiles and improves mobile revenue for app developers. While at Personagraph, Frank leveraged known data about their clients' apps and users, and modeled app events to predict click-through rates for targeted ads in addition to churn and lifetime value. By leveraging Factorization Machines, in conjunction with probabilistic models to predict user satisfaction, Frank and his colleagues achieved an 8-fold increase in click-through rate on test runs.