December Tutorial: Scalable Machine Learning in R & Python with H20
Hosted by R-Ladies London
Details
Presenting R-Ladies London December Tutorial: Scalable Machine Learning in R & Python with H2O!!
REGISTER ON EVENTBRITE: https://www.eventbrite.co.uk/e/r-ladies-dec-tutorial-scalable-machine-learning-in-r-python-with-h20-tickets-29702018537?aff=es2
P.S the meetup RSVP will remain closed as we will be using Eventbrite ONLY to manage registrations - having two sign-up platforms always causes lots of confusion!!
Timings: talk starts at 6.15pm, ends 8pm
-------------------------------------------------------------------------------------
Our speaker is Dr. Erin LeDell (http://www.stat.berkeley.edu/~ledell/), fellow Co-founder of the R-Ladies Global community and an Organiser of R-Ladies San Francisco, who is stopping over in London to give this talk! She is a Statistician & Machine Learning Scientist at H2O.ai (http://www.h2o.ai/), the company that produces the open source machine learning platform, H2O (https://github.com/h2oai/h2o-3). She is the author of a handful of machine learning related software packages (http://www.stat.berkeley.edu/~ledell/software.html), including the h2oEnsemble (https://github.com/h2oai/h2o-3/tree/master)!
The focus of this talk is scalable machine learning using the H2O R (http://www.h2o.ai/download/h2o/r) and Python (http://www.h2o.ai/download/h2o/python) packages. H2O (https://github.com/h2oai/h2o-3) is an open source, distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster). The core machine learning algorithms of H2O are implemented in high-performance Java, however, fully-featured APIs are available in R, Python, Scala, REST/JSON, and also through a web interface.
R and Python code with H2O machine learning code examples will be demo-ed live and are available on GitHub ( https://github.com/h2oai/h2o-tutorials#r-tutorials ) for attendees to follow along on their laptops.
Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine. H2O currently features distributed implementations of Generalized Linear Models, Gradient Boosting Machines, Random Forest, Deep Neural Nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), anomaly detection methods, among others. The ability to create stacked ensembles, or "Super Learners", from a collection of supervised base learners is provided via the h2oEnsemble (https://github.com/h2oai/h2o-3/tree/master/h2o-r/ensemble) R package.