Skip to content

Data Science by R programming(Beginner level, Five Sundays) R002

Photo of Vivian Zhang
Hosted By
Vivian Z. and Scott K.
Data Science by R programming(Beginner level, Five Sundays) R002

Details

Announcement: We are changing the date from five Saturdays to five Sundays. Sorry for the inconvenience!

------------------------------------------------------

Date: Feb 2nd, Feb 9th, Feb 23th, Mar 2nd of 2014(Five Sundays)

Time: 10:00pm to 5:00 pm (4.5 hours teaching, 1.5 hours hands-on, 1 hour break)

Instructors:

Vivian Zhang (CTO @Supstat Inc, Master degrees in Computer Science and Statistics)

Scott Kostyshak (Data Scientist @ Supstat Inc, 5th year Econ PhD at Princeton Univ.)

Cost:

Individual: $220/class, $1100 for all five classes

For group(5 or more persons) and enterprise pricing, please email vivian.zhang@supstat.com

The class is extended from first offering 20 hours to 35 hours, the new charge is $1100 for five classes.

If you'd like to sign up and reserve your seat, you can use our site, http://www.nycdatascience.com . Here you can learn a little bit more about our mission, and see upcoming classes.You can also pay directly on meetup.com using the paypal option.

Course Outline:

(Content may be adjusted based on the real teaching condition)

Basics: 6 hours
Abstract: explain the basic operation of knowledge through this unit of study , students can learn the characteristics of R , resource acquisition mode , and mastery of basic programming
Case and Exercise: Using the R language completion of certain Euler Project (euler project)

  • How to learn R
  • How to get help
  • R language resources and books
  • RStudio
  • Expansion Pack
  • Workspace
  • Custom Startup Items
  • Batch Mode
  • Data Objects
  • Custom Functions
  • Control statements
  • Vectorized operations

Getting data: 6 hours

Abstract: explain the various ways the R language read data , the participants through the basic WEB knowledge of web crawling , connect to the database via sql statement calling data from a variety of local read excel file data .
Case studies and exercises: crawl watercress data on the site , write a custom function .

  • Web data capture
  • API data source
  • Connect to the database
  • Local Documentation
  • Other data sources
  • Data Export

Data manipulation: 6 hours

Abstract: how to manipulate the data use R for the all kinds of data conversion, especially for string operation processing .
Case studies and exercises : Find the QQ(the most used instant messager tool) group , then discuss research options with text features.

  • Data sorting
  • Merge Data
  • Summary data
  • Remodeling Data
  • Take a subset of data
  • String manipulation
  • Date Actions

Data Visualization 6 hours

Abstract: cover two advanced drawing package , lattice and ggplot2, understand the various methods of visualization to explore.
Case and Exercise: Using graphics to right before the movie , text and other data to describe

  • Histogram
  • Point
  • Column
  • Line
  • Pie
  • Box Plot
  • Scatter
  • Matrix related
  • Map

Elementary statistical methods: 6 hours
Abstract: The primary explanation to use R for statistical analysis , regression analysis, students can master the basic statistical significance and role model.
Case and Exercise: Using regression to predict commodity prices ; simulated casino game winner.

  • Descriptive Statistics
  • Statistical Distributions
  • Frequency and contingency tables
  • Correlation
  • T test
  • Non-parametric statistics
  • Linear Regression
  • Regression Diagnostics
  • Robust Regression
  • Nonlinear regression
  • Principal Component Analysis
  • Logistic Regression
  • Statistical Simulation

Preliminary data mining ( If we finish the class early, we will cover selected topics based on your need)

Abstract: explain the R language for data mining expansion pack and functions use , students can master the supervised learning and unsupervised learning two mining methods .
Case and Exercise: Use R to participate in Kaggle Data Mining Competition

  • General Mining Process
  • Rattle bag
  • Hierarchical clustering
  • K -means clustering
  • Decision Trees
  • BP neural network
Photo of NYC Data Science Academy group
NYC Data Science Academy
See more events
AlleyNYC
500 7th ave 17th floor · New York, NY