R Machine Learning Class Demo Day (R010 and R012)

This is a past event

20 people went

Price: $10.00 /per person
Location image of event venue


NYC Data Science Academy student demo day

Twitter: @NycDataSci (https://twitter.com/NycDataSci)

Learn with our NYC Data Science Program (http://www.nycdatascience.com/) (We've offered corporate and individual training to more than 40 firms in NYC alone). We offer 12 week immersive program, weekend and weekday night Data Science training.

Join the open house to learn more about our 12-Week Data Science bootcamp (http://nycdatascience.com/bootcamp/). (Apply before Deadline May 6th.)

----------------- Data Science with R (Data Analysis level) 10th batch and (Data Mining level)(12th batch) are graduating.

They learned all the skills through 10 days of weekend and will apply them into real world problems. Welcome to join us and enjoy interesting data insights from their final projects.

If you are interested to sign up for this R course, next availability of this class is Mar 28-Apri 25 (http://nycdatascience.com/courses/data-science-with-r-data-analysis-2/), 2015 offering.


Speaker: Yi

Goal: predict condo price in Jersey City

Dataset: All open data

0. tax assessment data http://tax1.co.monmouth.nj.us/cgi-bin/prc6.cgi?ms_user=monm&district=0906

1. Condo buildings:http://livingonthehudson.com (http://livingonthehudson.com/)

2. Location primness: http://walkscore.com (http://walkscore.com/)

3.Transit location: http://maps.google.com (http://maps.google.com/)

4. Zip code level demographics:http://city-data.com (http://city-data.com/)

Workflow: collect data, clean, merge different sets, explore relationship, apply models, model selection(training/test split, put 10 competing algorithm ), finalize model

Questions to answer:

1. find important factors to impact condo's price(feature selection)

2. find overpriced/underpriced properties

3. Is that cool that higher floor condo is, the higher price?

4. what is the best season to buy condo?

5. whether now is a good year to buy condo?

6. the impact of transportation on the price (how close, how convenient)

7. maps of condo investors(who bought the property but don't live there)

Tools: lm, gbm, logistic reg, decision trees, Random Forest, PCA/PCR/PLSR


Speaker: Hans

Goal: predict conversion rate based on application

Dataset: simulated salesforce dataset

Worflow: pull data from salesforce.com, clean NAs, split dataset

Questions to answer:

1. likelihood the lead will convert

2. the relationship between type of business and conversion rate

3. predict the value of the account

4. important features to predict value and rates

5. deep understanding of high value of accounts

6. how lead nurturing program/lead channels link to sales

7. improve business operation based on the insights

Tools: decision trees, lm, stepwise, regsubset, PCA


Speaker: Mark Li

Goal: Predict view-ability score for each single impression

Dataset: simulated DSP log level data


Pull data from sql database and possibly merge it with variables from other third party dataIdeally, apply Markov Chain Monte Carlo sampling to extract prototyping dataset

EDA on the initial dataset to explore relationships and distribution among variablesApply data transformation according to the model assumptions and business questionSplit training and test dataset by 90% and 10%Apply ensemble models on bootstrap dataset with 5 different initial learners and iterate through each learner 30 timesValidate model on original training data and test on test data

Questions to answer:

What are the most important variables impact on view-ability?How each factors interact with view-ability (+/-)?What is the best VCPM to bid with based on view-ability?What are the different combinations of campaign set-up bucketed by view-ability?

Tools: lm, frequency table plot, box/quartile plot, knn, decision tree, PCA, leave-one-out frequency counts, normalization, glm, svm, naive bayes