"Official" October BARUG Meetup


6:30 - Pizza and Networking
7:00 - Announcements
7:05 - Dan Murphy - Bedford's Law
7:30 - Mac Roach - Using R and Alteryx to Project Polling Results to Small Geographic Areas
8:05 - Pete Mohanty - Presidential Forecasting with bigKRLS


Dan Murphy

Bedford's Law

Sara Silverstein of Business Insider describes Benford's Law (http://www.businessinsider.com/benfords-law-used-detect-fraud-math-forensic-accountants-2016-2) and goes on to show how Apple's financials over the last 10 years fulfilled that law. She suggests that Benford's Law has a place in the toolbox of a financial forensic. Ms. Silverstein also posits an explanation for why the law works, and investigates that theory with randomly generated calculations. In a post on my blog (www.triKnowBits.com (http://www.triknowbits.com/)) I duplicated Ms. Silverstein's random number results using R. That did not convince lay folk (like my wife, a financial expert) in the veracity of Benford's Law. So I subsequently evaluated four different financial records from the insurance space, and the law held to varying degrees. I will describe my results and show some of the R code if time permits.


Mac Roach

Using R and Alteryx to Project Polling Results to Small Geographic Areas

Election polling generally provides a useful means of predicting voter behavior nationally and at the state level. However, due to limited sample sizes and the cost associated with polling, understanding and predicting voter behavior at a “hyper local” level is not generally done – but this information would be invaluable for better focusing grassroots campaign activities such as door-to-door canvasing, get out the vote efforts , and yard signs and other outdoor media. In this work we set out to develop a model to make election predictions at a very localized level. Our solution to the problem accomplishes this in two parts: the first is the creation of a predictive model of voter choice based on both local area factors (such as county level Partisan Voting Index values) as well as individual demographic/socioeconomic characteristics, and the second is estimating the number of individuals that fall into specific demographic/socioeconomic profiles within each small geographic unit.

Our final model predicts voting outcomes at the census tract level (which typically have populations of around 4000 people), the results of which will be available for the public to explore in an interactive web app on the Alteryx Gallery in early October. In this talk we will be looking at the approach and methodologies used to develop these predictions around the upcoming election, which were implemented using a combination of R and Alteryx.


Pete Mohanty

Presidential Forecasting with bigKRLS

Using publicly available data including the American National Election Studies and the Huffington Post R API, we forecast the upcoming election. The data are rich but nonetheless limited; they lack: individual level data about third party challenger, Gary Johnson; post-primary individual data; and identifying information linking claims about voting intentions to public records about actual voter behavior. We attempt to overcome these limits using a mix of Bayesian measurement techniques (available in MCMCpack) and bigKRLS, a machine/statistical learning technique which uses RcppArmadillo for speed and bigmemory to handle memory-intensive out-of-sample predictions.