Use machine learning to predict the virality of youtube videos


Details
This project serves as a great introduction to tackling a supervised data science / machine learning project for newcomers from start to finish dealing with both structured and unstructured data. It is my hope that even intermediate users will find something useful to take away from this project, so let's jump in!
Business Objective: All businesses want to offer the right product, at the right place, time, and price. In order to increase their chances, they have to optimize their marketing strategy and increase their brand awareness and online presence. The top three things most companies will focus on to reach their customers online:
- Their website – Fundamental information customers need to see
- Search engine optimization –find the website and info about the company
- Social media- Facebook, Twitter, Youtube, etc. – builds engagement and awareness
This project will focus on one channel of number 3 – advertising on Youtube videos to answer the question of how many potential customers can be reached by purchasing advertisements on a particular video. A ranking system was created in order to classify videos into categories based on their ability to become viral. The definition of a “black belt” or viral video can be modified and is subjective to the user. This project can also be used by Youtube to help detect view count hacking. This was a common problem in which Youtube had to utilize manual review at certain thresholds to verify authentic view counts.
From this project, you'll learn:
- Getting Data through Youtube API
- Data Wrangling
- Feature Engineering / Selection
- Pipeline, NLP Union Transformations
- A bunch of different ML models (LR, Random Forest, GBM, etc.)
- Tuning - Grid and Random search CV
- Model evaluation and next steps
Addi Wei is currently a data scientist with a pharmaceutical company in the Atlanta area. He has a B.S. in computer science, Master's in industrial engineering and combined over 10 years of professional experience with companies such as IBM and Siemens in roles such as product, project management and 3 years of data engineering /science experience.
As a data scientist, Addi supports the Sales and Marketing function of his company using different frameworks / tools such as Apache Spark, Databricks, Qubole. Addi enjoys the opportunity to share his knowledge to help others and concurrently learn about the progress and technologies in this field.
( $10 in cash will be collected at event to cover venue cost. Coffee, tea, and snacks included)

Use machine learning to predict the virality of youtube videos