Riding on large data with Scikit-Learn Out-of-Core Algorithms


6:30 to 7:00 PM - Networking

7:00 to 7:15 PM - Lightning Talks

• "Quantifying a meal" - Miguel Alonso MD PhD

• "Edit-run-repeat: Stop the cycle of pain" - Peter Bull

7:15 to 8:00 PM - Talk and Q&A

- Riding on large data with scikit-learn out-of-core algorithms

In recent years, Data Science has focused primarily on Big Data tools and analytics. At the same time, Small Data platforms have also witnessed gradual enhancement providing end user with better tools for analysis and insight.

But, what about data that doesn't fit neatly into the current categorization of Small or Big Data? What if you had data that was too big to fit into the memory of the local machine and too small to justify Big Data solutions.

Termed Large Data, several solutions exist to overcome this quagmire. Amazon EC2, H20, Dato's, Graphlab Create and R Streaming Package all provide services with Large Data in mind.

However, an often overlooked tool is Scikit-Learn. Scikit- Learn is an open source machine learning library for Python programming language. Developed to optimize smaller data, it nevertheless provides a decent set of algorithms for out-of-core classification, regression, clustering and decomposition.

In this talk, Alex Perrier will focus on scikit-learn out-of-core algorithms and explore their performances in the context of large data.

About Speaker:

Alex Perrier is Data Scientist and Software Engineer with a strong Mathematical and Signal Processing background, a solid experience in Agile software development and a passion for Stochastic processes. He currently works for Berklee Online as Data and Software Lead. Alex holds a Ph. D. from Telecom Paris Tech, with a focus on Signal processing, Mathematics, Stochastic Processes.