The promise of big data is better predictions. There is no best model that works for all of your data. Model predictive performance is domain specific. What works in one data domain has sometimes very little consequence in another one. Data science needs to get closer to the business and unlock value.
Ensembles are here to stay! Users want a buffet of algorithms that try to "lock-pick" the data for it's secrets. Time is eventually the key limiter. Data science efforts have to make best out of the budget for experimentation and use some kind of co-evolutionary technique that picks the "Champion" model of models for your data. Robust automation and fast analytics can speedup large parts of data smithy. In this talk we discuss ensemble techniques of boosting & trees that when applied on use cases lead to a substantial better predictions. H2O Open Source Machine Learning Platform will be used as a demo bed for GBM and RF.
Sri is co-founder and ceo of 0xdata (@hexadata), the builders of H2O. H2O democratizes bigdata science and makes hadoop do math for better predictions. Before 0xdata, Sri spent time scaling R over bigdata with researchers at Purdue and Stanford. Prior to that Sri co-founded Platfora and was the Director of Engineering at DataStax. Before that Sri was Partner & Performance engineer at java multi-core startup, Azul Systems, tinkering with the entire ecosystem of enterprise apps at scale. Before that Sri was at sabbatical pursuing Theoretical Neuroscience at Berkeley. Prior to that Sri worked on nosql trie based index for semistructured data at in-memory index startup RightOrder.Sri is known for his knack for envisioning killer apps in fast evolving spaces and assembling stellar teams towards productizing that vision. A regular speaker in the BigData, NoSQL and Java circuit, Sri leaves trail @srisatish.