Modeling Big Data with Better Algorithms and H2O

How to Stop Worrying and Start Modeling Big Data with Better Algorithms and H2O

Data Modeling has been constrained through scale; Sampling still rules the day for Adhoc Analytics. Scale brings much needed change to the modeling world. In this talk we present the predictive power of using sophisticated algorithms on big datasets. With large data sizes comes the particularly hard problem of unbalanced data with multiple asymmetrically rare classes. Missing features pose unique problems for most Classification and Regression algorithms and proper handling can lead to greater predictive power. In the race for Better Predictions, H2O makes practical techniques accessible to anyone through an easy-to-use software product.

H2O is an open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms while keeping the widely used languages of R and JSON as an API.  H2O integrates neatly into popular data ecosystems of hadoop, amazon s3, nosql and sql. We briefly discuss design choices in the implementation of Distributed Random Forest and Generalized Linear Modeling and bringing speed and scale to vox populi of Data Science, R. We take a peek at the elegant lego-like infrastructure that brings fine grained parallelism to math over simple distributed arrays.

A short hacking data demo presents the life cycle of Data Science: Powerful Data Manipulation via R at scale, Interactive Summarization over large datasets, Modeling using Elastic Net (GLM), Grid Search for best parameters & low-latency scoring.

Presenter 

Srisatish Ambati 

0xdata Inc

Sri is co-founder and ceo of 0xdata (@hexadata), the builders of H2O. H2O democratizes bigdata science and makes hadoop do math for better predictions. Before 0xdata, Sri spent time scaling R over bigdata with researchers at Purdue and Stanford. Prior to that Sri co-founded Platfora and was the Director of Engineering at DataStax. Before that Sri was Partner & Performance engineer at java multi-core startup, Azul Systems, tinkering with the entire ecosystem of enterprise apps at scale. Before that Sri was at sabbatical pursuing Theoretical Neuroscience at Berkeley. Prior to that Sri worked on nosql trie based index for semistructured data at in-memory index startup RightOrder.

Sri is known for his knack for envisioning killer apps in fast evolving spaces and assembling stellar teams towards productizing that vision. A regular speaker in the BigData, NoSQL and Java circuit, Sri leaves trail @srisatish.

Join or login to comment.

  • John Cordell

    Good to see you!

    May 28

    • Michael Segel

      Sorry, missed it. Flight from London was late, traffic downtown took so long that I didn't get back in time.

      June 2

  • Manju

    Yes we were all impressed!!Thanks so much Satish and Josephine!

    May 31

  • Ofelia H. Ladao

    Excellent presentation ! Nice to see the usefulness of the tool in relation to the exponential growth of data jn the heathcare industry under the new healthcare reform. Thanks Satish and Josephine !

    May 31

  • Ameena Lalani

    Very informative presentation by Satish. Thanks to Rob and Josephine(Oxdata) for organizing this event. A simple request is to have a followup presentation on this topic by Satish. Also if possible, Orbitz location will be better than Jak's Tap. Thanks

    1 · May 29

  • Anil Chitreddy

    Can not make it. Freeing up the slot for others.

    May 28

  • nav

    Big data enthusiast

    May 26

Our Sponsors

  • Orbitz Worldwide

    A leading global online travel company and technology innovator.

  • Cloudera

    The leader in Apache Hadoop-based software and services.

  • HortonWorks

    A leading provider of support and services for Apache Hadoop.

  • TechNexus

    Chicago’s first collaborative ecosystem for tech entrepreneurs.

  • Oracle

    Industry leading hardware and software solutions for data management.

  • Couchbase

    Open source NoSQL for mission-critical systems.

  • Terracotta

    In-memory data management for the enterprise.

People in this
Meetup are also in:

Sometimes the best Meetup Group is the one you start

Log in

Not registered with us yet?

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy