addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1light-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Machine Learning & In-Memory Computing

  • May 1, 2014 · 6:30 PM
  • This location is shown only to members


6:30-7:00: Meet fellow members, networking
7:00-7:15: Welcome, raffle 1 registration to PASS Business Analytics Conference!
[masked]:00: Machine Learning: The Race for Great Predictive Power
8:00-8:45: GridGain Open Source In-Memory Computing Platform
8:45-9:00: Raffle for DataEDGE Conference, discussion and networking


Data Modeling has been constrained through scale; Sampling still rules the day for adhoc analytics. Scale brings much needed change to the modeling world. In this talk we present the predictive power of using sophisticated algorithms on big datasets. With large data sizes comes the particularly hard problem of unbalanced data with multiple asymmetrically rare classes. Missing features pose unique problems for most classification and regression algorithms and proper handling can lead to greater predictive power. In the race for better predictions, H2O makes practical techniques accessible to anyone through an easy-to-use software product.

H2O is an open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms while keeping the widely used languages of R and JSON as an API. It integrates neatly into popular data ecosystems of Hadoop, Amazon S3, NoSQL and SQL. We briefly discuss design choices in the implementation of Distributed Random Forest and Generalized Linear Modeling, and bringing speed and scale to the vox populi of Data Science, R. We take a peek at the elegant lego-like infrastructure that brings fine grained parallelism to math over simple distributed arrays.

A short hacking data demo presents the life cycle of Data Science:
Powerful Data Manipulation via R at scale, Interactive Summarization over large datasets, Modeling using Elastic Net (GLM), Grid Search for best parameters & low-latency scoring.

SriSatish Ambati is Co-Founder and CEO of 0xdata (@hexadata), the builders of H2O. H2O democratizes big data science and makes Hadoop do math for better predictions. Prior to Oxdata Sri has held a variety of leadership roles in the private and academic sectors. He co-founded Platfora, and was the Director of Engineering at DataStax. At Azul Systems, a java multi-core startup, Sri was Partner & Performance Engineer, where he got to work on the entire ecosystem of enterprise apps at scale. In academics, Sri worked with researchers at Purdue and Stanford to scale R over big data, and pursued Theoretical Neuroscience at Berkeley. Sri is known for his knack for envisioning killer apps in fast evolving spaces and assembling stellar teams towards productizing that vision. A regular speaker in the BigData, NoSQL and Java circuit, Sri leaves a trail @srisatish.

Get an overview of GridGain 6.0, a Java-based Apache 2.0 licensed In-Memory Computing platform that combines clustering, high performance computing, streaming and Complex Event Processing (CEP), in-memory data grid, and Hadoop acceleration into one unified, easy to use platform.  GridGain software is used by hundreds of companies around the world to deliver unprecedented performance and scalability gains in a variety of industries including finance, mobile payments, in-game merchant platforms, hyper-local advertising, medical imaging, cognitive analytics and natural language processing applications.

Nikita Ivanov is Founder and CTO of GridGain Systems, started in 2007 and funded by RTP Ventures and Almaz Capital. Nikita has led GridGain to develop advanced and distributed in-memory data processing technologies – the top Java in-memory computing platform starting every 10 seconds around the world today.

Nikita has over 20 years of experience in software application development, building HPC and middleware platforms, contributing to the efforts of other startups and notable companies including Adaptec, Visa and BEA Systems. Nikita was one of the pioneers in using Java technology for server side middleware development while working for one of Europe’s largest system integrators in 1996.

He is an active member of Java middleware community, contributor to the Java specification, and holds a Master’s degree in Electro Mechanics from Baltic State Technical University, Saint Petersburg, Russia.


Thanks to GridGain for hosting the venue


Many thanks to the Professional Association for SQL Server (PASS) for donating a complimentary full-conference registration for the PASS Business Analytics Conference May 7-9, 2014 at the San Jose Convention Center. The PASS Business Analytics Conference is a professional, community-oriented gathering for business analytics professionals. This all-access pass, valued at $1795, covers your full attendance at the conference, which begins at the evening Welcome Reception, Wednesday, May 7. It includes  all sessions Thursday-Friday, May 8-9, as well as all evening events and conference meals.

The Professional Association for SQL Server (PASS) is an independent, not-for-profit association dedicated to supporting, educating, and promoting the global Microsoft SQL Server community.

The pass will be raffled at the May 1 meet-up.

DISCOUNT CODE: PADSA members receive $300 off by using discount code: BACPA300

Learn More


Many thanks to the UC Berkeley School of Information for setting up a discount code, and for donating one complimentary registration for the DataEDGE Conference May 8-9, 2014 at UC Berkeley. The complimentary registration is valued at $650 and will be raffled at our May 1, 2014 meeting.

DataEDGE brings together social scientists, computer scientists, policy-makers, designers, and artists for an intimate two-day conference to assess the current state of data science and the data revolution. DataEDGE conference will bring you up to speed quickly on the current state of the data revolution. You will hear from leading experts in the field about the way organizations are using data to address business and societal issues, about the challenges of working with data at scale, and about the most pressing questions and debates facing data scientists today.

DISCOUNT CODE: PADSA members receive 10% off by using discount code DE14-D9KB

Learn more


The ASE (Academy of Science and Engineering) is holding their “Big Data Conference” at Stanford May 27-31. PADSA members can now register for a full 5-day pass to the conference in exchange for volunteering some time to help at the conference. Volunteering is a great way to network with other technologists, and you get to attend the conference in exchange for your time.

Available shifts can range from just a couple hours to a full day. They seem pretty flexible so put your hat in the ring and volunteer today!

Register to volunteer here (please register as "ASE member $0" and note “PADSA” or “Palo Alto Data Science Association” when you register)

Program Schedule

Join or login to comment.

  • Jim L

    An interesting way of doing ML. I am not sure I would agree. The presentation layer seemed to be a bit dated but that is very hard to do right anyway.

    May 1, 2014

  • Dan B.

    Sometimes in the evening about 4,5 or 6, highway 92 will jam with cars headed for the bridge.
    To avoid the jam I will go early, about 2 or 3pm. I'll wait for the Meetup at Starbucks which is a 3 min walk from Grid Gain.
    Google: Yelp Starbucks 1000 Metro Center Blvd. So I'll be at Starbucks doing Data Science with my laptop. All are welcome to join me. I'll be wearing a RedHat baseball cap. If they have no open tables, I will be at the Bagel place or the Burrito place next door.

    April 30, 2014

Our Sponsors

  • Aerospike

    A special thanks to Aerospike for hosting our June 5, 2014 meet-up

  • Modern Massive Data Sets Foundation

    Raffle 3 passes to MMDS 2014 Workshops June 17-20, 2014 at UC Berkeley

  • GigaOm

    Raffle+25% Discount Code to the STRUCTURE Conference June 18-19, 2014

  • Internet of Things World

    Raffle+15% discount code to Internet of Things World, June 16-19, 2014

  • Academy of Science and Engineering

    Free 5-day Stanford conference pass May 27-31, 2014

  • Professional Association for SQL Server

    Raffle+Discount Code to the PASS Data Analytics Conference May 7-9, 2014

  • UC Berkeley School of Information

    Raffle+Discount Code to the DataEDGE Conference May 8-9, 2014

  • GridGain

    A special thanks to GridGain for hosting our May 1, 2014 meet-up.

  • Innovation Enterprise

    Raffle+Discount Code to the Big Data Innovation Summit April 9-10, 2014

  • Google

    A special thanks to Google for hosting our April 3, 2014 meet-up.

  • DatumFora

    DatumFora raffled off 2 passes to the Data360 Conference April 2-3, 2014

  • Facebook

    Special thanks to Facebook for hosting our March 6, 2014 meet-up.

  • Opallios

    Peter Zadrozny, Founder and CTO, is our guest speaker March 6, 2014.

  • Groupon

    Special thanks to Groupon for hosting our February 6, 2014 meet-up.

  • O'Reilly Strata

    O'Reilly raffled a Strata Conference 2014, Making Data Work pass!

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy