Past Meetup

Large Scale Machine Learning with Python + Building data products

This Meetup is past

255 people went

Location visible to members


Spencer Aiello wrote the Python & R rapids architecture. Spencer is data scientist turned into software engineer. His talks are filled with examples from scikit and python. His official bio says, "Spencer is a tall drink of H2O: mysterious and furry. He enjoys long bike rides in the scorching sun. He has no life beyond H2O."

Cliff Click - CTO - - Python is increasingly becoming the language of choice of data scientists and machine learning practitioners. Traditional python libraries for data munging, cleaning, transformation and actual predictive modeling are however not built for distributed computing and are either extremely slow or fail for big massive datasets. But now there’s a new set of python tools which enable tera-scale feature munging from the REPL or a script, and the ability of run state-of-the-art ML algorithms (including Deep Learning, GBM, GLM, Random Forest, PCA, and more) on these massive datasets using the power of distributed computing. We'll open with a brief discussion of H2O's Python integration and then dive directly in a ML demo, showing feature munging of various datasets, group-by's and join's, then taking the massaged data into several different ML algorithms - to get the most predictive model, all from Python.