Data Science by R programming(Beginner level, Five Sundays) R002


Details
Announcement: We are changing the date from five Saturdays to five Sundays. Sorry for the inconvenience!
------------------------------------------------------
Date: Feb 2nd, Feb 9th, Feb 23th, Mar 2nd of 2014(Five Sundays)
Time: 10:00pm to 5:00 pm (4.5 hours teaching, 1.5 hours hands-on, 1 hour break)
Instructors:
Vivian Zhang (CTO @Supstat Inc, Master degrees in Computer Science and Statistics)
Scott Kostyshak (Data Scientist @ Supstat Inc, 5th year Econ PhD at Princeton Univ.)
Cost:
Individual: $220/class, $1100 for all five classes
For group(5 or more persons) and enterprise pricing, please email vivian.zhang@supstat.com
The class is extended from first offering 20 hours to 35 hours, the new charge is $1100 for five classes.
If you'd like to sign up and reserve your seat, you can use our site, http://www.nycdatascience.com . Here you can learn a little bit more about our mission, and see upcoming classes.You can also pay directly on meetup.com using the paypal option.
Course Outline:
(Content may be adjusted based on the real teaching condition)
Basics: 6 hours
Abstract: explain the basic operation of knowledge through this unit of study , students can learn the characteristics of R , resource acquisition mode , and mastery of basic programming
Case and Exercise: Using the R language completion of certain Euler Project (euler project)
- How to learn R
- How to get help
- R language resources and books
- RStudio
- Expansion Pack
- Workspace
- Custom Startup Items
- Batch Mode
- Data Objects
- Custom Functions
- Control statements
- Vectorized operations
Getting data: 6 hours
Abstract: explain the various ways the R language read data , the participants through the basic WEB knowledge of web crawling , connect to the database via sql statement calling data from a variety of local read excel file data .
Case studies and exercises: crawl watercress data on the site , write a custom function .
- Web data capture
- API data source
- Connect to the database
- Local Documentation
- Other data sources
- Data Export
Data manipulation: 6 hours
Abstract: how to manipulate the data use R for the all kinds of data conversion, especially for string operation processing .
Case studies and exercises : Find the QQ(the most used instant messager tool) group , then discuss research options with text features.
- Data sorting
- Merge Data
- Summary data
- Remodeling Data
- Take a subset of data
- String manipulation
- Date Actions
Data Visualization 6 hours
Abstract: cover two advanced drawing package , lattice and ggplot2, understand the various methods of visualization to explore.
Case and Exercise: Using graphics to right before the movie , text and other data to describe
- Histogram
- Point
- Column
- Line
- Pie
- Box Plot
- Scatter
- Matrix related
- Map
Elementary statistical methods: 6 hours
Abstract: The primary explanation to use R for statistical analysis , regression analysis, students can master the basic statistical significance and role model.
Case and Exercise: Using regression to predict commodity prices ; simulated casino game winner.
- Descriptive Statistics
- Statistical Distributions
- Frequency and contingency tables
- Correlation
- T test
- Non-parametric statistics
- Linear Regression
- Regression Diagnostics
- Robust Regression
- Nonlinear regression
- Principal Component Analysis
- Logistic Regression
- Statistical Simulation
Preliminary data mining ( If we finish the class early, we will cover selected topics based on your need)
Abstract: explain the R language for data mining expansion pack and functions use , students can master the supervised learning and unsupervised learning two mining methods .
Case and Exercise: Use R to participate in Kaggle Data Mining Competition
- General Mining Process
- Rattle bag
- Hierarchical clustering
- K -means clustering
- Decision Trees
- BP neural network

Data Science by R programming(Beginner level, Five Sundays) R002