Data Science 201
Instructor: Dr. Michael Bowles
The class will meet for 4 Saturday mornings. Registration covers all 4 meetings. Early registration is $325. Last minute registration is $375. Pre-register on eventbrite.com. http://datascience201.eventbrite.com You can also register at the first class meeting (check or cash).
Overview of the Course
Data Science 201 begins with ordinary least squares regression and extends this basic tool in a number of directions. You'll learn various regularization approaches. You'll learn about logistic regression and how to code categorical inputs and outputs. You'll learn how to use feature space expansions for handling non-linearities. Next we'll go through modern high-speed algorithms for training these models on very large data sets(LARS, Glmnet).
Text: "The Elements of Statistical Learning - Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
See also Prof Robert Tibshirani's notes for stats 315a: http://www-stat.stanford.edu/~tibs/stat315a.html
Data Science 201 and 202 employ beginner-level probability, calculus and linear algebra (e.g. preruse the appendices in "Introduction to Data Mining" by Tan et. al. or Linear Algebra, and Probability Theory.) If you have taken intro Data Science or Machine Learning classes, you are well prepared for this course, but those are not required to start 201.
Participants should be familiar with R or be willing to pick R up outside of class. We will hand out R-code for most of our examples, but we won't spend time in 201 going through introductory material on R. Come to the first class with R and R-Studio loaded on your computer. See http://cran.r-project.org/