Skip to content

Data Workflows for Machine Learning

Photo of Tony Tran
Hosted By
Tony T. and David A.
Data Workflows for Machine Learning

Details

Main Talk: Data Workflows for Machine Learning

Speaker: Paco Nathan (http://liber118.com/pxn/)

Abstract:

We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.

Speaker bio:

Paco Nathan (http://en.wikipedia.org/wiki/Paco_Nathan), is a “player/coach” who's led innovative Data teams building large-scale apps for 10+ years, and worked as an OSS evangelist for the past 2+ years. Expert in distributed systems, machine learning, cloud computing, functional programming -- with a focus on Enterprise data workflows. Paco is an O'Reilly (http://oreilly.com/) author, and an advisor for several firms including The Data Guild (http://thedataguild.com/) andZettacap (http://www.zettacap.com/). Paco received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups.

Tentative Schedule:

6:30-7:00 - socializing

7:00-8:00 - main talk

8:00-8:30 - socializing

Special thanks:

Climate Corporation for hosting!

Photo of SF Bayarea Machine Learning group
SF Bayarea Machine Learning
See more events
The Climate Corporation
201 3rd Street Suite 1100 · San Francisco, CA