Location visible to members
This daylong workshop introduces the R (http://www.r-project.org/) language for statistical computing and graphics in a manner that assumes some prior programming experience in another language (such as Python (http://www.python.org/), Perl (http://www.perl.org/), Matlab (http://www.mathworks.com/products/matlab/), C/C++ (http://www.cplusplus.com/), etc…). The workshop focuses on the core of the language: working with data, data structures, base graphics, loops, and functions, and will emphasize the use of R as a programming language to address challenges beyond standard tools for data analysis and exploration. Advanced topics include regular expressions, data cleaning/munging, and an array of statistical methods for data analysis. Participants will work through examples both individually as a group. The day will be organized around four modules:
• The core language syntax and data structures for working with and exploring data. Accessing and organizing data; arithmetic and logical operators; conditionals arguments; loops; subsetting; common functions; getting help and using extension packages.
• Short break
• Graphics and writing customized functions. An emphasis on base graphics; graphical output formats; customization; an introduction to lattice and ggplot2.
• From data exploration to statistical inference, including cleaning/munging data from unusual sources. Case studies: the 2000 Olympic diving competition; studying bookie point spreads on college basketball.
• Short break
• Open topics to be determined, with data examples provided by the participants
Different people approach statistical computing with R in different ways. It can be helpful to start with a real-data problem and learn something about R “on the fly” while trying to solve a problem. But it is also useful to have a more organized, formal introduction to the core of the language without the distraction of a complicated applied problem. This course offers four distinct modules which offer some overlap, reinforcing the key concepts.
Using data from the 2000 Olympic diving competition, you will learn or review a small subset of the R language and syntax that supports an impressively large portion of everyday statistical visualization and analysis. Particular methods for review in this example include a comparison of t- and permutation tests. We'll start with displays from R's base graphics and will conclude with an introduction to grid graphics programming. Other smaller data examples will be used throughout the workshop.
The final module of the day will be shaped around participant interests and data contributions.
This is a hands on class where attendees will benefit from working along with the instructor so please bring a computer with the latest version of R installed. we strongly recommend using the RStudio (http://www.rstudio.com/) IDE.
This class differs from the Introduction to R (http://www.meetup.com/datascienceclasses/events/163115782/) class in that it is more intensive and intended for people with some programming experience.
John W. Emerson (http://www.stat.yale.edu/~jay/) ("Jay") is Director of Graduate Studies and Associate Professor of Statistics, Adjunct, Yale University.
Jay teaches a range of graduate and undergraduate courses and often includes timely real-world problems and examples in his lectures, an intersection of teaching and research. For example, he collaborated with the Wall Street Journal in uncovering the infamous stock option backdating scandal. He also demonstrated a design flaw in the new scoring system used for international figure skating competitions. Jay's courses include Introductory Statistics; Real World Statistics; Introductory Data Analysis; Theory of Statistics; Statistical Case Studies; Statistical Consulting; Advanced Data Analysis; and Statistical Computing. He has taught summer courses in statistics at Peking University and National Taipei University of Technology. He has given workshops around the world, including several different levels of introductions to R and more advanced workshops on high-performance computing with R.
His research includes Bayesian change point analyses as well as a range of topics in statistical computing and graphics. He has worked towards a scalable solution for statistical computing with massive data, extending support for the management, analysis, and exploration of massive data sets in R. Jay has been Principal Investigator and is Lead Statistician on the Yale/Columbia Environmental Performance Index (EPI), which he presented at the 2010 and 2012 World Economic Forums (http://www.weforum.org/) in Davos.
Jay is a Fellow and Treasurer of the Connecticut Academy of Arts and Sciences. He served as Secretary/Treasurer of the American Statistical Association (http://www.amstat.org/) Section of Graphics, and as Program Chair of the Section on Statistical Computing. He is an Associate Editor of the Journal of Statistical Software.
Class begins at 9:30 so please show up 15 minutes early to get settled. Lunch will be provided.
Thank you to our host Thomson Reuters (http://thomsonreuters.com/) and Cezary Podkul (http://www.meetup.com/datascienceclasses/members/7811893/) for making this possible.