Long time member Aaron Schumacher (https://github.com/ajschumacher) is talking to us this month about merging data with R and other tools.
Thank you to iHeart for hosting us and providing refreshments.
About the talk:
Combining data sets can be a huge pain, with possible problems both obvious and insidious. Aaron will present practical approaches for detecting and avoiding potential pitfalls, as well as rigorous and repeatable processes for generating merge tables through reduction to de-duplication. The focus will be on techniques for quickly achieving high accuracy for data sets of moderate size, with brief excursions into the entity resolution literature, machine learning for distance metrics, and applying clustering and visualization techniques including multidimensional scaling.
Aaron (http://planspace.org/aaron/) is an instructor at the Metis (http://www.thisismetis.com/) data science boot camp. He is the original author of the rjstat (http://cran.r-project.org/web/packages/rjstat/) package for R, which is mostly unrelated to this talk. He made emacs.link (http://emacs.link/), which is also not related to this talk. Aaron blogs at plan ➔ space (http://planspace.org/) and tweets as @planarrowspace (https://twitter.com/planarrowspace).
Pizza begins at 6:30, the talk at 7 then we'll go to a nearby bar after.