Data Science 201
Instructor: Dr. Michael Bowles
Overview of the Course
Data Science 201 begins with ordinary least squares regression and extends this basic tool in a number of directions. We'll consider various regularization approaches. We'll introduce logistic regression and we'll learn how to code categorical inputs and outputs. We'll look at feature space expansions for handling non-linearities. Next we'll go through modern high-speed algorithms for training these models on very large data sets(LARS, Glmnet).
Text: "The Elements of Statistical Learning - Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
See also Prof Robert Tibshirani's notes for stats 315a: http://www-stat.stanford.edu/~tibs/stat315a.html
Data Science 201 and 202 employ beginner-level probability, calculus and linear algebra (e.g. preruse the appendices in "Introduction to Data Mining" by Tan et. al. or Linear Algebra, and Probability Theory.) If you have taken intro Data Science or Machine Learning classes, you are well prepared for this course, but those are not required to start 201.
Participants should be familiar with R or be willing to pick R up outside of class. We will hand out R-code for most of our examples, but we won't spend time in 201 going through introductory material on R. Come to the first class with R and R-Studio loaded on your computer. See http://cran.r-project.org/