Implementing and consuming Machine Learning (ML) techniques at scale are difficult tasks for ML Developers and Domain Experts. MLbase, an open-source project developed by the AMPLab at UC Berkeley and the Database Group at Brown, is a platform addressing the issues of both groups. In this talk, Ameet Talwalkar and Evan Sparks will describe the various components of the system, including a low-level distributed machine learning library in Spark, an API for machine learning algorithms and feature extractors, and recent work on higher-level functionality to autotune basic ML pipelines.
More on MLBase can be found here.
Spark is part of the BDAS stack that enables interactive and streaming analytics on large scale datasets. More can be found here.
Ameet Talwalkar is an NSF post-doctoral fellow in the Computer Science Division at UC Berkeley. His work addresses scalability and ease-of-use issues in the field of machine learning, as well as applications related to large-scale genomic sequencing analysis. He obtained a bachelor's degree from Yale University and a Ph.D. from the Courant Institute at New York University.
Evan Sparks is a PhD student in the Computer Science Division at UC Berkeley. His research focuses on the design and implementation of distributed systems for large scale data analysis. Prior to Berkeley he spent several years in industry tackling large scale data problems as a Quantitative Financial Analyst at MDT Advisers and as a Product Engineer at Recorded Future. He holds a bachelor's degree from Dartmouth College.