6:30 - Pizza and Networking
7:00 - Announcements
7:05 - Karthik Mokashi: Deep Learning in Predicting Market Movements
7:30 - John Mount: rquery: a Query Generator for Working With SQL Data Sources From R
8:10 - Dave Hurst: Double dribble: Two practical use cases for the googledrive package
Deep Learning in Predicting Market Movements
This talk gives a preview of work in progress. In this initiative we examine if neural networks can reliably predict the forward movement of the S&P 500 index. We examine over 50 years of daily S&P returns and engineer the data set for processing by a neural network. We have use H2O with R and Tensorflow/Python to build and validate our results.
John Mount, Win-Vector LLC
rquery: a Query Generator for Working With SQL Data Sources From R
rquery ( https://github.com/WinVector/rquery ) is an R package for data wrangling on SQL databases and Spark. I will start with a demonstration of "piped SQL" and move on to powerful operator pipelines. Piped SQL allows those merely familiar with SQL to build up powerful exprt-level multi-stage data transforms using fragments of SQL and legible pipe notation for composition (much clearer than typical SQL nesting). From there I will move on to powerful non-SQL operator notation based on Codd's work and influenced by experience working with SQL and dplyr at big data scale. Such piped operator notation allows even those unfamiliar with SQL to build, test, use, and maintain powerful data processing pipelines at big data scale. The rquery system includes simple and regular rules for building up data processing pipelines from basic primitives and includes powerful operations such as SQL window functions. rquery is a "query first" package where both data processing pipelines and SQL queries are inspectable objects. rquery has proven to generate high-performance queries and be reliable in managing complex data workflows at scale.
John Mount is a data scientist working for the consulting firm Win-Vector LLC. He is one of the authors of the popular data science book "Practical Data Science with R" (Zumel, Mount; Manning 2014) and a frequent author and speaker on machine learning and data science topics. He is also a frequent contributor to the popular Win-Vector technical blog: http://www.win-vector.com/blog/ .
Double dribble: Two practical use cases for the googledrive package
googledrive is a tidyverse package that allows you to interact with files on Google Drive from R. We'll explore practical use cases that demonstrate the usefulness of the package as well as the basics of working with dribbles and listcols. Dribbles are an interesting example of 'rectangling' data that may not easily fit into a data frame, so many of the techniques used by the package have broader application for handling data in R.