Transfer learning with Spark + IBOSS Data Reduction Method


Details
QuantumBlack has agreed to sponsor our February Meetup
There will be couple of talks, lots of networking along with pizza and beer!
Please bring an ID to get into the building.
Agenda:
6:00pm - 6:30pm
Meet and greet. Networking.
6:30pm - 7:00pm
Transfer learning through Spark DL pipelines - Talk by Vishal Rajpal
7:00pm - 7:30pm
On Data Reduction of Big Data - Talk by Min Yang
7:30pm - 8:00pm
Q/A with Speakers and QuantumBlack
Title:
Transfer learning through Spark Deep Learning pipeline
Speaker:
Vishal Rajpal
Abstract:
Transfer learning is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks.
With the amount of time and data required to train neural nets, transfer learning is becoming more relevant as we try to leverage existing models.
We will look at the Spark Deep Learning pipeline with code snippets to understand available API options
Speaker Bio:
Vishal Rajpal is a Principal Data Engineer at QuantumBlack. He oversees architecture, information security and data engineering for analytics and machine learning development projects.
His specializes in leveraging best practices of product and design engineering for faster analytics development and deployment.
Prior to QB, Vishal had worked at Fractal Analytics, MSCI and Accenture.
----
Title:
On Data Reduction of Big Data
Speaker:
Min Yang
Abstract:
The big data paradigm has drawn a significant amount of attention in recent years as costs of acquiring and storing data have plummeted. Instead, bottlenecks have been shifted to fast and in-depth analysis. However, this shift has created its own set of problems, the most obvious one is that large datasets are often computationally expensive to process. Proven statistical methods are no longer applicable with extraordinary large data sets due to computational limitations. A critical step in Big Data analysis is data reduction. In this presentation, I will review some existing approaches in data reduction and introduce a new strategy called information-based optimal subdata selection (IBOSS). Under linear and nonlinear models set up, theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to other approaches in term of parameter estimation and predictive performance. The tradeoff between accuracy and computation cost is also investigated. When models are mis-specified, the performance of different data reduction methods are compared through simulation studies. Some ongoing research work as well as some open questions will also be discussed.
Speaker Bio:
Min Yang is Professor of statistics at University of Illinois at Chicago. Before he joined UIC in 2012, he worked at University of Nebraska-Lincoln as an Assistant Professor from 2002 to 2005 and University of Missouri as an Assistant and Associate Professor from 2005 to 2012. Min Yang received his PhD from UIC in 2002. His primary research area is subdata selection in big data analysis and optimal design of experiments, which is mainly supported by NSF. Min Yang has won the prestigious NSF CAREER award in 2008. He has published more than 10 papers in Annal of Statistics and JASA. Currently he serves as Associate Editors for five statistical journals including JASA and Statistica Sinica.
---
Sponsors:
This event is sponsored by QuantumBlack, a McKinsey Company.

Transfer learning with Spark + IBOSS Data Reduction Method