Main Talk: Data Workflows for Machine Learning
Speaker: Paco Nathan
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
Paco Nathan, is a “player/coach” who's led innovative Data teams building large-scale apps for 10+ years, and worked as an OSS evangelist for the past 2+ years. Expert in distributed systems, machine learning, cloud computing, functional programming -- with a focus on Enterprise data workflows. Paco is an O'Reilly author, and an advisor for several firms including The Data Guild andZettacap. Paco received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups.
6:30-7:00 - socializing
7:00-8:00 - main talk
8:00-8:30 - socializing
Climate Corporation for hosting!