addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1linklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Big Data Science "Machine Learning Evening"@ Hadoop Summit, 2013





5 P.M. - 5:30 P.M. -- Networking

5:30 P.M. - 6:10 P.M. -- Session 1

Title: Apache Mahout: How it's good, how it's awesome, and where it falls short

Speaker: Ted Dunning, Chief Application Architect, MapR

Abstract: I will present an up to date report on the  state of Mahout, including both virtues and vices.  In  particular, I will present system descriptions of how various real customers have produced real recommendations for real customers using Mahout.  I will also talk about some of the amazing speedups and quality improvements that Dan Fillimon and others have achieved in clustering.  I will also describe how Mahout can be used to solve some of the largest graph problems in the world.  And then I will put some orange cones around some of the areas where Mahout is not as strong as other packages.  We will close with audience participation to hear what people want to do with Mahout or wish that Mahout could do.

Speaker Bio: Ted has held Chief Scientist positions at Veoh Networks, ID Analytics and at MusicMatch, (now Yahoo Music). Ted is responsible for building the most advanced identity theft detection system on the planet, as well as one of the largest peer-assisted video distribution systems and ground-breaking music and video recommendations systems. Ted has 15 issued and 15 pending patents and contributes to several Apache open source projects including Hadoop, Zookeeper and Hbase™. He is also a committer for Apache Mahout. Ted earned a BS degree in electrical engineering from the University of Colorado; a MS degree in computer science from New Mexico State University; and a Ph.D. in computing science from Sheffield University in the United Kingdom. Ted also bought the drinks at one of the very first Hadoop User Group meetings.

6:10 P.M. - 6:25 P.M. -- Q/A

6:25 P.M. - 6:30 P.M. -- Break

6:30 P.M. - 7:10 P.M. -- Session 2

Title: Implementing SVM in parallel on Hadoop

Speaker: Steven Hillion, Chief Product Officer, Alpine Data Labs

Abstract: Support Vector Machines are among the most powerful and mathematically mature algorithms in machine learning. They are well-suited to classification problems applied to complex datasets, a common scenario in the world of 'big data' and Hadoop. But kernel machines in general, and Support Vector Machines in particular, are not well suited to the MapReduce paradigm. The computational complexity in implementing SVM arises from the requirements for dual representation of data and model during training and prediction as well as the iterative nature of the popular convex optimizers for approximating the solution.

In this presentation, we propose an outline for implementing a very general and efficient form of SVM on the MapReduce framework using some interesting recent research in large-scale convex 
optimization and kernel computation theories, and we evaluate the performance of these methods with an implementation in the Alpine Data Labs machine-learning platform.

Speaker Bio: Steven Hillion has been leading large engineering and analytics projects for fifteen years. Before joining Alpine Data Labs, he founded the analytics group at Greenplum, leading a team of data scientists and also designing and developing new open-source and enterprise analytics software. Before that, he was Vice President of Engineering at M-Factor, Inc. (acquired by DemandTec) where he built analytical applications that became a global standard for demand modeling. Earlier, at Kana Communications, Steven led the engineering group during the two largest releases of its flagship product. At Scopus Technology (later Siebel Systems)  he  co-founded development groups for finance, telecom and other verticals. He received his Ph.D. in mathematics from the University of California, Berkeley, and was a King Charles I Scholar at Oxford University.

7:10 P.M. - 7:25 P.M. -- Q/A

7:25 P.M. - 7:30 P.M. -- Break

7:30 P.M. - 7:50 P.M. -- Session 3

Title: Big Data + Better Algorithms  ==> Better Predictions with  H2O

Speaker: SriSatish Ambati, Founder and CEO, OxData

Abstract: H2O's fast high scale open source algorithms are set to revolutionize Predictive Analytics. A math engine that brings interactivity and scale to Big Data Modeling heralds newer possibilities, one without sampling. In this talk, we describe our popular distributed algorithms of Classification, Regression & Clustering and early signs of superior predictive performance with ample help from Big Data. We also take a peek at H2O's infrastructure: fine-grain parallelism that reduces early and often; leading to lot less intermediate data and a lot better memory behavior. Finally, we show how we are democratizing big data science with ease-of-use and transparency that can entice new data enthusiasts into an exclusive sport!

Speaker Bio: Sri is co-founder and CEO of 0xdata (@hexadata), the builders of H2O. H2O democratizes bigdata science and makes hadoop do math for better predictions. Before 0xdata, Sri spent time scaling R over bigdata with researchers at Purdue and Stanford. Prior to that Sri co-founded Platfora and was the Director of Engineering at DataStax. Before that Sri was Partner & Performance engineer at java multi-core startup, Azul Systems, tinkering with the entire ecosystem of enterprise apps at scale. Before that Sri was at sabbatical pursuing Theoretical Neuroscience at Berkeley. Prior to that Sri worked on nosql trie based index for semistructured data at in-memory index startup RightOrder.

Sri is known for his knack for envisioning killer apps in fast evolving spaces and assembling stellar teams towards productizing that vision. A regular speaker in the BigData, NoSQL and Java circuit, Sri leaves trail @srisatish.


7:50 P.M. - 8:00 P.M. -- Q/A

8:00 P.M. - 8:10 P.M. -- Break

8:10 P.M. - 8:30 P.M. -- Session 4

(Summary of this event and ideas to explore in future Big Data Science events)

Title: Model, Methodology, Metadata and Machine Learning -- Why and How

Speaker: Shyam SunDar Sarkar, Organizer of Big Data Science and CEO of AyushNet

Abstract: Big data involves 4 V's: high volume, high velocity, high variety and/or high variability information assets that require new forms of processing for decision making, insight discovery and process optimization. Machine learning, a branch of artificial 
intelligence, was originally employed to develop techniques to enable computers to learn. Today, it includes a number of advanced statistical methods for regression and classification with Big Data information assets and there are machine learning applications in a wide variety of domains including cancer genomics, medical diagnostics, credit card fraud detection, face and speech recognition, latest financial regulations and analysis of the stock market. Evolving Big Data Science applications need new models and new methodologies for machine learning with Big Data and Metadata. Our vision is to characterize 4 M's: Model, Methodology, Metadata and Machine Learning to address processing complexities of Big Data with 4 V's.

8:30 P.M. - 8:45 P.M. -- Suggestions, Questions/Answers

8:45 P.M. - 10:00 P.M. -- Demo and Networking at Individual tables of Sponsors

Sponsored By:



















Join or login to comment.

  • Ramarao Y.

    Can some one point me to the slides from these presentations? thanks

    July 24, 2013

  • Avilash K.

    Hi ,
    Needed some help
    I followed the tutorial on the Nutch Website.
    I am using Nutch 1.6 with Solr 3.6.
    Everything went well till the end but when I searched passed a query.
    It gave me no results.

    July 2, 2013

  • Sumedha S.

    I was interested to attend, but could not make it.

    June 26, 2013

  • Amir Y.

    Posting slides?

    June 26, 2013

  • Madhu K.

    Wish I could attend but something has come up.

    June 25, 2013

  • Craig M.

    Hoping you find a bigger room.

    June 17, 2013

  • A former member
    A former member

    Bummer this is full-up. Any chance of larger venue, streaming, recording, or slides?

    June 13, 2013

    • Shyam S.

      Please come for the event. If you do not get a place to sit, you can stand and listen to the talks. Also there will be another event going on in parallel. Many people will come and go for the parallel sessions.

      June 14, 2013

  • Bing H.

    I am a framework developer at Apple. I have zest in Machine learning.

    June 13, 2013

  • Sean W.

    Late arrival!

    June 6, 2013

  • Sumedha S.

    Interested to attend.

    May 14, 2013

  • Sumedha S.

    Attended one meeting and was hooked to Hadoop! I would like to have a conversation on the Hadoop and MapReduce with someone interested in integrating Statistical tools into the dynamic data stream moving through the system. I am currently reading about this network to understand the I/O and the processing system. I am an applied statistician.

    May 14, 2013

  • Ravi Kiran K.

    Deeply interested , will definitely attend

    May 3, 2013

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy