Official January 2018 Meetup

Details

6:30 Pizza and Networking
7:00 Announcements
7:05 Lightning Talk: John Mount: cdata Fluid Data Transformations for R
7:20 Lightning Talk: Robert Horton: Performance vs. Simplicity: Visualizing the tradeoffs from pruning decision trees
7:35 Oswald Campesato: R and Deep Learning

##########################

John Mount

cdata Fluid Data Transformations for R

Abstract: In my lightening talk I'll briefly introduce the cdata R package. cdata is Win-Vector's second generation "coordinatized data" tool for R. cdata supplies generalizations of pivot and un-pivot where one specifies the desired transform by supplying a control table that is literally a picture of the desired transformation. I will show how to quickly solve a data transformation problem in terms of this table. cdata is based on DBI which lets it perform the same transformations at big data scale (right now targeting Spark or PostgreSQL).

###########################

Robert Horton

Performance vs. Simplicity: Visualizing the tradeoffs from pruning decision trees.

I will be demonstrating a Shiny app intended to help business users see how lift plots, cumulative gain curves, and decision trees are interconnected, so they can better understand how simplifying the tree (and the rules it produces) affects the predictive performance of the model. There is often a "sweet spot" where a moderately simplified tree has optimal cross-validation performance; simpler trees may be easier to understand, but this typically comes at the cost of decreased performance. However, performance is often more importance in some areas than others (for example, if you are interested in the customer segment most likely to purchase a widget, performance of the model on the less likely segments may be less useful), so it is important to understand where the performance is being lost. This simple tool recapitulates many of the important concepts in data science, and we hope it can help bring subject matter experts deeper into the modeling process.

##########################

Oswald Campesato

Title: "R and Deep Learning"Description:"This fast-paced introduction to Deep Learning (DL) concepts is intended for Data Scientists who are interested in DL. Topics include neural networks, back propagation, activation functions, CNNs, RNNs (if time permits), along with the CLT/AUT/fixed-point theorems, and code samples in R/Keras and R/TensorFlow."

About me:"Oswald is a former PhD Candidate (ABD) in Mathematics, an education junkie (6 degrees), and an author of 17 technical books (including Angular). He has worked for Oracle, AAA, and Just Systems of Japan, along with various startups. He has lived/worked in 5 countries on three continents, and in a previous career he worked in South America, Italy, and the French Riviera, and has traveled to 70 countries on six continents. He has worked from C/C++/Java developer to CTO, comfortable in 4 languages, and wants to become fluent in Japanese. Currently he is a consultant and provides training in Android, R, and Deep Learning."-----------------------------------