"Official" October 2018 BARUG Meetup
Details
Agenda:
7:00 Announcements
7:05 Dave Hurst- Sharing Artifacts in a Corporate Environment
7:20 Peter Li - Introduction to the cholera package
7:50 Ryan Moran - Bozo Blocking: rapid development of fraud detection models over local entity networks
8:20 Michael Kevane - Things that go wrong in R exercises for undergraduates: Scatterplots, wrangling, maps, sentiment analysis, RMarkdown
==========================
Peter Li
Introduction to the 'cholera' package
John Snow's map of the 1854 cholera outbreak in London's Soho is a classic example of data visualization. For Snow, the map helped to support his two then contested, if not controversial claims: that cholera is a waterborne disease and that the water pump on Broad Street was the source of the outbreak.
To evaluate whether the map does or can actually supports such claims, I created the 'cholera' R package (CRAN and GitHub). The package allows you to explore, analyze and test the data embedded in the map. It does so by computing and plotting a pump's neighborhood: the set of locations defined their "proximity" to a pump.
The talk will focus on the tools and techniques used to compute and visualize these "pump neighborhoods" and will include examples (all in R) of everything from orthogonal projection to more specialized topics like Voronoi tessellation ('deldir'), spatial data analysis ('sp'), graph/network analysis ('igraph'), generic functions (e.g., S3 generic functions), and embarrassingly parallel problems ('parallel').
===============================
Ryan Moran
Data + Fraud, Bandcamp.com
Bozo Blocking: rapid development of fraud detection models over local entity networks
Fraud mitigation is a demanding challenge, particularly for a small organization like Bandcamp.com. In order to enhance defenses against an ever-churning tide of brilliant villains, we developed a platform in R to reduce the labor, time, and resources required to prepare, fit, and compare complex regression models. In this talk, we'll first cover several core aspects of system design, including only-as-necessary parallel predictor computation, persistent high-performance MySQL caching, and automated parameter optimization. For the finale, we'll explore the platform's most powerful capability: "local" network summary predictors, particularly those computed over the outputs of other predictive models.


