Paco Nathan: Data Workflows for Machine Learning

This month, we're going to try something different, as an inaugural meeting at the permanent Seattle Twitter office: a full-length talk by Paco Nathan:

Title: Data Workflows for Machine Learning

Abstract:
A variety of tools and frameworks for large-scale data workflows have emerged, which has substantial impact on machine learning practices in industry. On the one hand, ML work can be integrated more readily into a wide range of other frameworks, and be migrated across environments. An example case is to train a model in SAS on a data sample, then export the model as PMML, to be run at scale on a Hadoop cluster (sans license fees) based on Cascading/Pattern. Other great examples include: KNIME (R, Weka, Eclipse, Hadoop, Actian, etc.); ADAPA from Zementis in Amazon AWS workflows; and in the Python stack an ecosystem of Augustus, scikit-learn, Pandas, IPython, etc. In the emerging category, there is Spark/MLbase, and also Julia with a variety of integrations. Spark and Scala integrations become quite interesting in the the broader context of Summingbird and Algebird -- indicating how some of notions of workflow could be generalized. This talk considers a compare & contrast of these different workflow approaches, along with some perspectives on use cases and indications, plus where they appear to be heading.

About the speaker:
Paco Nathan is an O'Reilly author ("Enterprise Data Workflows with Cascading") and an advisor for The Data Guild in Palo Alto, CA. He was formerly a lead dev on the "Pattern" open source project for PMML scoring in Cascading, and teaches "Intro to Machine Learning" and "Intro to Data Science" courses based on R, Python, Scala, etc.

Join or login to comment.

  • Paco N.

    Many thanks to all who attended, and especially to Jake and Twitter for hosting in the fantastic new office. I really appreciate the opportunity to present, and am grateful for so many great discussions about data workflows and tech in general! :)

    Just uploaded the slides here:
    http://www.slideshare.net/pacoid/data-workflows-for-machine-learning

    3 · January 30, 2014

    • Dylan D

      I've worked with large enterprises. It's amazing how little importance is given to workflow as a key design element from the get-go (can't be helped when Excel takes center stage).
      ML workflows are the silver bullet for efficiency, accuracy, and repeatability and this presentation will help in a huge way! Thanks, Paco.

      1 · January 30, 2014

    • Paco N.

      Well said Dylan!

      January 31, 2014

  • Silvia V.

    A great speaker and a much needed capable review of current popular ML workflows. Thanks for organizing!

    1 · January 30, 2014

    • Paco N.

      Thanks Silvia! Let's discuss more about how best to characterize R and its env. I want to fix this scorecard :)

      January 30, 2014

  • Jake M.

    January 30, 2014

  • Dylan D

    Great conversations yesterday, looking forward to seeing the deck.

    2 · January 30, 2014

  • Jody L.

    Great talk last night. Lots of interesting projects to investigate. Thanks to Paco and Twitter.

    1 · January 30, 2014

  • A former member
    A former member

    Thank You

    1 · January 30, 2014

  • Phillip B.

    Nice talk. Pertinent. Workflow for ML is a first sign of maturity of ML in the (some!) enterprises. Good to know state of the latest on workflow.

    1 · January 30, 2014

  • Pat T.

    Just FYI, this post over in the useR meetup is re. a new distributed R platform: http://www.meetup.com/Seattle-useR/boards/view/viewthread?thread=41465582

    January 30, 2014

  • Luke S.

    Great - thank you!

    1 · January 30, 2014

  • David Z.

    Fantastic talk. I'd love to attend a class, and do a deep dive. Thanks for organizing; I can't wait for the next one!

    2 · January 30, 2014

  • Pat T.

    Very informative! Loved the insights into important, but often overlooked, considerations when constructing a machine learning workflow.

    1 · January 29, 2014

  • Chris M.

    Agreed, great talk. Thanks Paco and Twitter team. Looking forward to revisiting the slide deck. Can I assume that will be linked here?

    2 · January 29, 2014

  • John L.

    The breadth and depth of this talk was amazing. Quite the drink from the firehose. Excellent!

    2 · January 29, 2014

  • A former member
    A former member

    Excellent talk!

    1 · January 29, 2014

  • Jake M.

    I'll make sure to post a link to the slides later, folks!

    2 · January 29, 2014

  • Aeden J.

    Will the slide deck be posted somewhere?

    January 29, 2014

  • Jonathan

    We are stuck on the 3rd floor

    January 29, 2014

  • A former member
    A former member

    I would like to see the slides as well. Unfortunately, couldn't make it.

    January 29, 2014

  • Jordan G.

    A few of us are downstairs on the 3rd floor. Can someone head down to let us in?

    January 29, 2014

  • Mike L.

    2nd the slide request

    1 · January 29, 2014

  • Alex K.

    Please add a link to slides following the talk?

    Stuck on the Eastside :-(

    2 · January 29, 2014

  • Kushal L.

    Will not be able to make it. Stuck at work. Sorry.

    January 29, 2014

  • Jake M.

    Ok folks, so I'm going to be there and letting people up to Twitter's Commons area on the 20th floor starting around 6:15 or 6:30, when the pizza should arrive. Bring maybe $5 if you plan on having pizza / beer, and the talk will start pretty promptly at 7pm.

    January 29, 2014

  • Jake M.

    So there have been some questions about food. Normally, I buy pizza and bring beer. Feeding 165 of you, on the other hand, seems like it would get prohibitively expensive.

    I could still buy pizza and beer and ask for a $5 donation for everyone who wants some?

    19 · January 22, 2014

    • Jake M.

      Yeah, it'll be way lower, but sure. Anyone reading this! Please "like" my comment above about chipping in for food, if you want me to bring enough pizza for you.

      4 · January 22, 2014

    • A former member
      A former member

      Jake, I have come down with a cold and won't be attending. I've unliked the comment to update the pizza count. If it wasn't in time and you've already ordered, please let me know and I'll find a way to get my portion to you.

      January 29, 2014

  • Jaime Lyn S.

    :( sorry to bail. Hope to make the next one!

    January 28, 2014

  • Aaron s.

    Hey im stuck on the 3rd floor can someone let me in please ... Security won't ...

    January 22, 2014

    • Joseph W.

      I think you have the wrong week, my friend...

      January 22, 2014

  • A former member
    A former member

    How about parking? Is it easy to find parking around there?

    January 22, 2014

    • Jake M.

      After 6pm it's $6 to park at Century Square itself. Entrance is on Pike, between 3rd and 4th.

      January 22, 2014

  • A former member
    A former member

    Really looking forward to this.

    1 · January 21, 2014

  • Joseph A.

    HOLY COW! 163 RSVPs That's awesome!! To bad I'm out of town this week. Would of love to come.

    January 21, 2014

  • Dan J

    New to the group. Is there food or should I grab dinner before? Is the presentation at 7pm with food/social before?

    January 16, 2014

  • Andreas M.

    That topic sounds awesome! Too bad I'm on the wrong side of the Atlantic :-/

    January 6, 2014

  • Michael J.

    I look forward to attending this meeting and to learning something new about these technologies!

    1 · January 6, 2014

Our Sponsors

  • Dato

    Dato is sponsoring yummy food/drinks.

People in this
Meetup are also in:

Create your own Meetup Group

Get started Learn more
Allison

Meetup has allowed me to meet people I wouldn't have met naturally - they're totally different than me.

Allison, started Women's Adventure Travel

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy