Data Science & Hadoop

 

Knitting Boar

Josh Patterson(Cloudera)

Online learning techniques, such as Stochastic Gradient Descent (SGD), are powerful when applied to risk minimization and convex games on large problems. However, their sequential design prevents them from taking advantage of newer distributed frameworks such as Hadoop/YARN. In this session, we will introduce “Knitting Boar”, an open-source Java library for performing distributed online learning on a Hadoop cluster under YARN. We will give an overview of how Knitting Boar works and examine the lessons learned from YARN application construction.

 

 

The content will be similar to a talk given at Strata / Hadoop World in NYC:

http://strataconf.com/stratany2012/public/schedule/detail/25445

 

If you want to seen a non-distributed example of SGD then please check out

http://scikit-learn.org/0.11/modules/sgd.html

 

Join or login to comment.

  • Adam M.

    November 28, 2012

  • Mahan

    Good introduction to Machine Learning in parellel dev env from guy who has done it.

    November 27, 2012

  • Adam M.

    Thanks for coming out everyone. I will post the deck tomorrow and setup a tentative time in january for the data science hands on. I will also put up a link to the angel hack thing on dec 1st.

    November 27, 2012

  • Adam M.

    [masked]

    November 27, 2012

    • Adam M.

      Txt or phone me if you cant get in

      November 27, 2012

  • Hardik

    Unfortunately I won't be able to make it tonight, is any content is going to be posted online for later use?

    November 27, 2012

  • James W.

    I assume a laptop won't be needed? Also, is there a URL for where the "Knitting Boar" library is hosted? Google didn't seem to return much...

    November 20, 2012

    • Adam M.

      No laptop will be needed. We'll have mostly standing room. Think of it as a nerdy party with a speaker. :) Here are the Knitting Boar Libraries: https://github.com/jpa...­

      November 22, 2012

  • Adam M.

    Update: depending on ACTUAL turn out we will probably have standing room only for most of you at this event. That's ok, it's a cool place to stand and you can mingle with Toronto's startup community. :)

    That being said, please be Canadian and save some chairs for those who might need them more than you...

    2 · November 20, 2012

  • Sri T.

    Adam,
    Is this suitable for beginner level Hadoopers like me

    November 17, 2012

    • Adam M.

      I would say it is suitable to anyone who has a real interest in applying complicated analytics on hadoop. If you are totally new to hadoop, new to machine learning, and not looking to apply any of these skills in the future then it might not be for you. I'm trying to ramp up the technical level for some of the meet ups to bring back some actual hadoop users that might have been alienated by the introductory material. I think hadoop users and math or comp sci students should definitely go, regardless of their experience level. There will be better meetups for networking, this kind of session helps to enhance the collective group knowledge and get the experienced users together.

      1 · November 17, 2012

    • Sri T.

      Thanks Adam. Definitely not new to Hadoop or machine learning and already in progress of applying to real world solutions. I believe I should be there. Thanks again.

      November 17, 2012

  • Victor W.

    Any related algorithms will be introduced or explored?

    November 16, 2012

    • Adam M.

      Likely I will cover Mahout and what it can do, but there will not be time to deep dive on anything in it. Josh is going to cover some SGD content and how to make that work in a distributed way over YARN/MR2.

      November 16, 2012

  • Adam M.

    Still working on a location that's big enough folks. Try to make your RSVP's accurate please as I'm booking space based on it.

    Also, check out some of Josh's github goodies, examples on Mahout:
    https://github.com/jpatanooga/MahoutExamples

    November 6, 2012

    • Inbae A.

      Hey Adam. bNotions swanky new office near st Lawrence market can accomodate. I sent a message to your gmail with details.

      November 16, 2012

    • Adam M.

      Thanks, going to reach out today...

      1 · November 16, 2012

  • Surendra M.

    Any webex link available for remote people ?

    October 31, 2012

    • Adam M.

      By the way, this group is to foster a tech community in Toronto. I just noticed that are in bases out of India. If you do not live in Ontario or work/visit on a regular basis then I would suggest that leave the meetup group for those who are local. By all rights you can't really have contributions to a local community. I encourage you to explore some distance learning options from Cloudera or other apache hadoop vendors as they are geared towards remote learning and would be more valuable to you than scraps from local events. You could always start your own local hadoop user group too.

      October 31, 2012

    • Paresh Y.

      They can start a group but they won't have Adam Muise to make it as good as TOHUG!

      October 31, 2012

  • Adam M.

    Thanks Tri! There has been a few proposals and I'll organize something based on the speaker and crew size. If we have a lot of people and the speaker is good, I'll want to accomodate as many as possible for maximum value to the group. I'm going to focus on tomorrow's PigFest and then I'll make sure we have a solid date, time, and place for the Data Science meetup.

    October 29, 2012

  • Tri N.

    We can volunteer our lunch / meeting room. It can accommodate around 50 to 60 persons. There is a projector equipment. If needed, audio system can also be hooked up.

    T4G Limited (http://www.t4g.com)
    100 Broadview Ave.
    Suite 300
    Toronto, ON M4M 3H3
    http://maps.google.ca/maps?q=43.658496,-79.349678&hl=en&sll=43.656877,-79.32085&sspn=0.715361,1.483154&t=m&z=16

    October 25, 2012

Our Sponsors

  • IBM

    Meeting facilities, expert speakers, free product, books and education.

  • Big Data University

    Free on-line courses in Hadoop and big data related technologies.

  • Cloudera

    10% off training for Toronto Hadoop User Group members.

  • Hortonworks

    Food, speakers, beverages

  • T4G

    Hosting Meeting locations and providing relevant speakers

People in this
Meetup are also in:

Create your own Meetup Group

Get started Learn more
Henry

I decided to start Reno Motorcycle Riders Group because I wanted to be part of a group of people who enjoyed my passion... I was excited and nervous. Our group has grown by leaps and bounds. I never thought it would be this big.

Henry, started Reno Motorcycle Riders

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy