Topic: Data Processing for Machine Learning / Artificial Intelligence/ Natural Language Processing
Speakers: Robin Tully, Matthew Brown, Otto Barnes
Food & Networking: We'll have pizza and soda from to 6:00-6:45
Robin Tully developed his passion for machine learning at the University of Oxford. In prior work, has developed intelligent chatbots for Sony Pictures, with one product driving 10 million interactions worldwide scaling up to 12,000 new users per hour for film promotion. He has also worked with Internet artist Poppy, allowing users to interact with a representation of the artist driving 15 million messages in 2 weeks. Currently, he is investigating ways in which artificial intelligence can drastically enrich the Immigration application process as a machine learning engineer at Excella Consulting.
Robin Tully will present how Data Wrangling for Natural Language Processing entails its own unique set of challenges and opportunities. This talk will explore the ways in which data can be prepared and transformed to be properly and expediently ingested by neural networks in TensorFlow; highlighting the transition of text into mathematical vectors, training pipelines through a computational graph and back into text.
- Necessity of word embeddings for model training
- Chunking input for model ingestion
- Queuing for model training
- Bundling a trained model
Sales Forecasting: Data Processing with Machine Learning for Time Series Information
Matt Brown has five years of professional experience in data science, project management and management consulting. He currently works as a data scientist at Red Oak Strategic where he helps clients better utilize their data assets through modeling, visualization and application development. Prior to that, he spent two years working as a management consultant at Booz Allen Hamilton. He enjoys using data to predict the future.
Matt will be speaking about processing data for time-series analysis using machine learning. He will walk through the preprocessing steps I completed for a recent Kaggle competition Corporación Favorita Grocery Sales Forecasting.
- Data Cleansing: Finding and correcting or removing inaccurate entries
- Data Imputation: How do you handle missing values in your data? What are the key things to think about in regards to imputing time series data?
- Feature Engineering: What features do you include so your model 'understands' that the data has time component (e.g. lag features, sliding window mean / median / max etc.)? How do you create these features?
- Handling categorical, ordinal and numeric data for machine learning.
CloudFlower: Wrangling Truth from a Crowd on the Cloud
Otto Barnes is an avid technologist and has been poking around at Machine Learning and Deep Learning for the past few years. His 12 years of Software Experience and Computer Engineer grad-school work put him in the middle of the production vs. research camps at his role at AddThis.com, now a part of the Oracle Data Cloud.
CrowdFlower (like Amazon Mechanical Turk and others) enables Data Scientists to build truth sets quickly by harnessing crowdsourced data. It's not without its issues, however. We will investigate a few ways to approach gathering data and discuss pitfalls and tools to wrangle the truth. Specifically, I will demo discriminating a truth set for actor photos for an epic 2018 project.