Skip to content

Data Science by R programming(Beginner level, Five Sundays) 001

Photo of Vivian Zhang
Hosted By
Vivian Z. and Scott K.
Data Science by R programming(Beginner level, Five Sundays) 001

Details

Update:

our Dec 1st(Sunday, thanksgiving weekend) will be held as planned.

If you can't make thanksgiving weekend class, you can choose from

--attend make-up session on Dec 7th(Sat)

--watch the video from home and take the make-up session in Feb, 2014.

----------------------------------------------------------------------------

Date: Nov 10th, Nov 17th, Nov 24th, Dec 1st, Dec 8th (Five Sundays)

Time: 12:00pm to 4pm

Instructors:
Scott Kostyshak (Data Scientist @ Supstat Inc, 5th year Econ PhD at Princeton Univ.)

Vivian Zhang (CTO @Supstat Inc, Master degrees in Computer Science and Statistics)

Slides contributor:

Kai Xiao(Data Scientist @ SupStat Inc), Scott Kostyshak, Vivian Zhang

Special Thank:

We thank Ramnath Vaidyanathan(Advisory Data Scientist @ SupStat Inc, professor at McGill University), Joe Cheng(Software Engineer @ Rstudio), Josh Paulson(product manager @ Rstudio) for suggesting us a few Stunning R showcases.

Cost:

Individual: $110/class

For group(5 or more persons) and enterprise pricing, please email vivian.zhang@supstat.com

Course Outline:

(Content may be adjusted based on the real teaching condition)

Basics 6 hours
Abstract: explain the basic operation of knowledge through this unit of study , students can learn the characteristics of R , resource acquisition mode , and mastery of basic programming
Case and Exercise: Using the R language completion of certain Euler Project (euler project)

  • How to learn R
  • How to get help
  • R language resources and books
  • RStudio
  • Expansion Pack
  • Workspace
  • Custom Startup Items
  • Batch Mode
  • Data Objects
  • Custom Functions
  • Control statements
  • Vectorized operations

Data for two hours

Abstract: explain the various ways the R language read data , the participants through the basic WEB knowledge of web crawling , connect to the database via sql statement calling data from a variety of local read excel file data .
Case studies and exercises: crawl watercress data on the site , write a custom function .

  • Web data capture
  • API data source
  • Connect to the database
  • Local Documentation
  • Other data sources
  • Data Export

Data collation 3 hours

Abstract: how to manipulate the data use R for the all kinds of data conversion, especially for string operation processing .
Case studies and exercises : Find the QQ(the most used instant messager tool) group , then discuss research options with text features.

  • Data sorting
  • Merge Data
  • Summary data
  • Remodeling Data
  • Take a subset of data
  • String manipulation
  • Date Actions

Data Visualization 3 hours

Abstract: cover two advanced drawing package , lattice and ggplot2, understand the various methods of visualization to explore.
Case and Exercise: Using graphics to right before the movie , text and other data to describe

  • Histogram
  • Point
  • Column
  • Line
  • Pie
  • Box Plot
  • Scatter
  • Matrix related
  • Map

Elementary statistical methods 5 hours
Abstract: The primary explanation to use R for statistical analysis , regression analysis, students can master the basic statistical significance and role model.
Case and Exercise: Using regression to predict commodity prices ; simulated casino game winner.

  • Descriptive Statistics
  • Statistical Distributions
  • Frequency and contingency tables
  • Correlation
  • T test
  • Non-parametric statistics
  • Linear Regression
  • Regression Diagnostics
  • Robust Regression
  • Nonlinear regression
  • Principal Component Analysis
  • Logistic Regression
  • Statistical Simulation

Preliminary data mining ( Selected Topics )

Abstract: explain the R language for data mining expansion pack and functions use , students can master the supervised learning and unsupervised learning two mining methods .
Case and Exercise: Use R to participate in Kaggle Data Mining Competition

  • General Mining Process
  • Rattle bag
  • Hierarchical clustering
  • K -means clustering
  • Decision Trees
  • BP neural network
Photo of NYC Data Science Academy group
NYC Data Science Academy
See more events
Flatiron School
11 Broadway, Suite 260 · New York, NY