Feature Selection w/ Spark: Bridging the Gap Between Data Engineering & Science


Details
Picking the write inputs to your data science models is a key requirement for accurate models. The traditional techniques that worked on EDWs/EDMs no longer work when dealing with billions of records each with several thousand attributes/features. First, we’ll discuss how data engineers and data science can get on the same page when dealing with large datasets. Then we’ll dive into using sparse vectors and feature selection algorithms to compress and reduce your data sets. The end result is that you can achieve models that are as accurate as models using the full data set in a fraction of the time and cost. This talk is for students that are yearning to get into data science as well as for data engineers that are considering the leap from engineering into a more data science focused role. Murray Webb Bio: Murray has 7 years of experience in statistical modeling within media and advertising and currently is a data science lead at IgnitionOne. Murray began his career at Comcast in the analytics group after completing a Masters in Applied Statistics from Kennesaw State University in 2012. Murray began his undergraduate coursework at Hampden-Sydney College focusing on economics, rhetoric, and mathematics. Murray is professionally interested in machine learning, data visualization, and product creation. When not working, Murray’s hobbies include fishing, music, and reading.
Kyle Burke Bio:
Kyle has over 10 years experience building enterprise data warehouse and data marts. In the last two years he’s been focused on dealing with big data solutions using Apache Spark. He recently joined the data science team at IgnitionOne and is now focused on build data science models predicting conversion and clicks in batch and realtime.
Please join us for good, drinks and socials at 630pm.
There is a paid parking deck in the Investco building. There are also street parking spots all around the IgnitionOne building.

Sponsors
Feature Selection w/ Spark: Bridging the Gap Between Data Engineering & Science