MACHINE LEARNING EVENING
5 P.M. - 5:30 P.M. -- Networking
5:30 P.M. - 6:10 P.M. -- Session 1
Title: Apache Mahout: How it's good, how it's awesome, and where it falls short
Speaker: Ted Dunning, Chief Application Architect, MapR
Abstract: I will present an up to date report on the state of Mahout, including both virtues and vices. In particular, I will present system descriptions of how various real customers have produced real recommendations for real customers using Mahout. I will also talk about some of the amazing speedups and quality improvements that Dan Fillimon and others have achieved in clustering. I will also describe how Mahout can be used to solve some of the largest graph problems in the world. And then I will put some orange cones around some of the areas where Mahout is not as strong as other packages. We will close with audience participation to hear what people want to do with Mahout or wish that Mahout could do.
Speaker Bio: Ted has held Chief Scientist positions at Veoh Networks, ID Analytics and at MusicMatch, (now Yahoo Music). Ted is responsible for building the most advanced identity theft detection system on the planet, as well as one of the largest peer-assisted video distribution systems and ground-breaking music and video recommendations systems. Ted has 15 issued and 15 pending patents and contributes to several Apache open source projects including Hadoop, Zookeeper and Hbase™. He is also a committer for Apache Mahout. Ted earned a BS degree in electrical engineering from the University of Colorado; a MS degree in computer science from New Mexico State University; and a Ph.D. in computing science from Sheffield University in the United Kingdom. Ted also bought the drinks at one of the very first Hadoop User Group meetings.
6:10 P.M. - 6:25 P.M. -- Q/A
6:25 P.M. - 6:30 P.M. -- Break
6:30 P.M. - 7:10 P.M. -- Session 2
Title: Implementing SVM in parallel on Hadoop
Speaker: Steven Hillion, Chief Product Officer, Alpine Data Labs
Abstract: Support Vector Machines are among the most powerful and mathematically mature algorithms in machine learning. They are well-suited to classification problems applied to complex datasets, a common scenario in the world of 'big data' and Hadoop. But kernel machines in general, and Support Vector Machines in particular, are not well suited to the MapReduce paradigm. The computational complexity in implementing SVM arises from the requirements for dual representation of data and model during training and prediction as well as the iterative nature of the popular convex optimizers for approximating the solution.
In this presentation, we propose an outline for implementing a very general and efficient form of SVM on the MapReduce framework using some interesting recent research in large-scale convex
optimization and kernel computation theories, and we evaluate the performance of these methods with an implementation in the Alpine Data Labs machine-learning platform.
Speaker Bio: Steven Hillion has been leading large engineering and analytics projects for fifteen years. Before joining Alpine Data Labs, he founded the analytics group at Greenplum, leading a team of data scientists and also designing and developing new open-source and enterprise analytics software. Before that, he was Vice President of Engineering at M-Factor, Inc. (acquired by DemandTec) where he built analytical applications that became a global standard for demand modeling. Earlier, at Kana Communications, Steven led the engineering group during the two largest releases of its flagship product. At Scopus Technology (later Siebel Systems) he co-founded development groups for finance, telecom and other verticals. He received his Ph.D. in mathematics from the University of California, Berkeley, and was a King Charles I Scholar at Oxford University.
7:10 P.M. - 7:25 P.M. -- Q/A
7:25 P.M. - 7:30 P.M. -- Break
7:30 P.M. - 7:50 P.M. -- Session 3
Title: Big Data + Better Algorithms ==> Better Predictions with H2O
Speaker: SriSatish Ambati, Founder and CEO, OxData
Abstract: H2O's fast high scale open source algorithms are set to revolutionize Predictive Analytics. A math engine that brings interactivity and scale to Big Data Modeling heralds newer possibilities, one without sampling. In this talk, we describe our popular distributed algorithms of Classification, Regression & Clustering and early signs of superior predictive performance with ample help from Big Data. We also take a peek at H2O's infrastructure: fine-grain parallelism that reduces early and often; leading to lot less intermediate data and a lot better memory behavior. Finally, we show how we are democratizing big data science with ease-of-use and transparency that can entice new data enthusiasts into an exclusive sport!
Speaker Bio: Sri is co-founder and CEO of 0xdata (@hexadata), the builders of H2O. H2O democratizes bigdata science and makes hadoop do math for better predictions. Before 0xdata, Sri spent time scaling R over bigdata with researchers at Purdue and Stanford. Prior to that Sri co-founded Platfora and was the Director of Engineering at DataStax. Before that Sri was Partner & Performance engineer at java multi-core startup, Azul Systems, tinkering with the entire ecosystem of enterprise apps at scale. Before that Sri was at sabbatical pursuing Theoretical Neuroscience at Berkeley. Prior to that Sri worked on nosql trie based index for semistructured data at in-memory index startup RightOrder.
Sri is known for his knack for envisioning killer apps in fast evolving spaces and assembling stellar teams towards productizing that vision. A regular speaker in the BigData, NoSQL and Java circuit, Sri leaves trail @srisatish.
7:50 P.M. - 8:00 P.M. -- Q/A
8:00 P.M. - 8:10 P.M. -- Break
8:10 P.M. - 8:30 P.M. -- Session 4
(Summary of this event and ideas to explore in future Big Data Science events)
Title: Model, Methodology, Metadata and Machine Learning -- Why and How
Speaker: Shyam SunDar Sarkar, Organizer of Big Data Science and CEO of AyushNet
Abstract: Big data involves 4 V's: high volume, high velocity, high variety and/or high variability information assets that require new forms of processing for decision making, insight discovery and process optimization. Machine learning, a branch of artificial
intelligence, was originally employed to develop techniques to enable computers to learn. Today, it includes a number of advanced statistical methods for regression and classification with Big Data information assets and there are machine learning applications in a wide variety of domains including cancer genomics, medical diagnostics, credit card fraud detection, face and speech recognition, latest financial regulations and analysis of the stock market. Evolving Big Data Science applications need new models and new methodologies for machine learning with Big Data and Metadata. Our vision is to characterize 4 M's: Model, Methodology, Metadata and Machine Learning to address processing complexities of Big Data with 4 V's.
8:30 P.M. - 8:45 P.M. -- Suggestions, Questions/Answers
8:45 P.M. - 10:00 P.M. -- Demo and Networking at Individual tables of Sponsors