Data Workflows for Machine Learning


Details
Main Talk: Data Workflows for Machine Learning
Speaker: Paco Nathan (http://liber118.com/pxn/)
Abstract:
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
Speaker bio:
Paco Nathan (http://en.wikipedia.org/wiki/Paco_Nathan), is a “player/coach” who's led innovative Data teams building large-scale apps for 10+ years, and worked as an OSS evangelist for the past 2+ years. Expert in distributed systems, machine learning, cloud computing, functional programming -- with a focus on Enterprise data workflows. Paco is an O'Reilly (http://oreilly.com/) author, and an advisor for several firms including The Data Guild (http://thedataguild.com/) andZettacap (http://www.zettacap.com/). Paco received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups.
Tentative Schedule:
6:30-7:00 - socializing
7:00-8:00 - main talk
8:00-8:30 - socializing
Special thanks:
Climate Corporation for hosting!

Data Workflows for Machine Learning