Data Science by R programming(Beginner level, Five Sundays) 001


Details
Update:
our Dec 1st(Sunday, thanksgiving weekend) will be held as planned.
If you can't make thanksgiving weekend class, you can choose from
--attend make-up session on Dec 7th(Sat)
--watch the video from home and take the make-up session in Feb, 2014.
----------------------------------------------------------------------------
Date: Nov 10th, Nov 17th, Nov 24th, Dec 1st, Dec 8th (Five Sundays)
Time: 12:00pm to 4pm
Instructors:
Scott Kostyshak (Data Scientist @ Supstat Inc, 5th year Econ PhD at Princeton Univ.)
Vivian Zhang (CTO @Supstat Inc, Master degrees in Computer Science and Statistics)
Slides contributor:
Kai Xiao(Data Scientist @ SupStat Inc), Scott Kostyshak, Vivian Zhang
Special Thank:
We thank Ramnath Vaidyanathan(Advisory Data Scientist @ SupStat Inc, professor at McGill University), Joe Cheng(Software Engineer @ Rstudio), Josh Paulson(product manager @ Rstudio) for suggesting us a few Stunning R showcases.
Cost:
Individual: $110/class
For group(5 or more persons) and enterprise pricing, please email vivian.zhang@supstat.com
Course Outline:
(Content may be adjusted based on the real teaching condition)
Basics 6 hours
Abstract: explain the basic operation of knowledge through this unit of study , students can learn the characteristics of R , resource acquisition mode , and mastery of basic programming
Case and Exercise: Using the R language completion of certain Euler Project (euler project)
- How to learn R
- How to get help
- R language resources and books
- RStudio
- Expansion Pack
- Workspace
- Custom Startup Items
- Batch Mode
- Data Objects
- Custom Functions
- Control statements
- Vectorized operations
Data for two hours
Abstract: explain the various ways the R language read data , the participants through the basic WEB knowledge of web crawling , connect to the database via sql statement calling data from a variety of local read excel file data .
Case studies and exercises: crawl watercress data on the site , write a custom function .
- Web data capture
- API data source
- Connect to the database
- Local Documentation
- Other data sources
- Data Export
Data collation 3 hours
Abstract: how to manipulate the data use R for the all kinds of data conversion, especially for string operation processing .
Case studies and exercises : Find the QQ(the most used instant messager tool) group , then discuss research options with text features.
- Data sorting
- Merge Data
- Summary data
- Remodeling Data
- Take a subset of data
- String manipulation
- Date Actions
Data Visualization 3 hours
Abstract: cover two advanced drawing package , lattice and ggplot2, understand the various methods of visualization to explore.
Case and Exercise: Using graphics to right before the movie , text and other data to describe
- Histogram
- Point
- Column
- Line
- Pie
- Box Plot
- Scatter
- Matrix related
- Map
Elementary statistical methods 5 hours
Abstract: The primary explanation to use R for statistical analysis , regression analysis, students can master the basic statistical significance and role model.
Case and Exercise: Using regression to predict commodity prices ; simulated casino game winner.
- Descriptive Statistics
- Statistical Distributions
- Frequency and contingency tables
- Correlation
- T test
- Non-parametric statistics
- Linear Regression
- Regression Diagnostics
- Robust Regression
- Nonlinear regression
- Principal Component Analysis
- Logistic Regression
- Statistical Simulation
Preliminary data mining ( Selected Topics )
Abstract: explain the R language for data mining expansion pack and functions use , students can master the supervised learning and unsupervised learning two mining methods .
Case and Exercise: Use R to participate in Kaggle Data Mining Competition
- General Mining Process
- Rattle bag
- Hierarchical clustering
- K -means clustering
- Decision Trees
- BP neural network


Data Science by R programming(Beginner level, Five Sundays) 001