addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Apache Bigtop Working Meeting -- Hadoop M/R coding and porting Hama to Bigtop

 

Review of Labs 1, 2, 3 for new members on 2/25/12

 

The IntelliJ licenses have been distributed to registered members as of 1/27... 

 

This is a working group meeting for Java Programmers interested in becoming Apache Bigtop Committers with corporate support/donations from MSFT, Amazon, Jetbrains, and Cloudera. The focus is on learning how to write cloud code using Hadoop, Hive, Flume, Sqoop, HBase, Mahout and Hama.

 

Members on the confirmed sign up list have been sent MSFT Hadoop Azure codes and AWS codes as of 1/11/12 for free cluster time, sponsored by MSFT and Amazon.

 

Roman from Cloudera available for questions.

 

This is not a class. This is a working group meeting where you can see what others have done, code they have written, and how other programmers go about deconstructing complicated pieces of s/w. This is a self paced format where each member works at their own pace. The material presented here is equivalent to what you would be getting if you joined as a new employee at Cloudera. Members show Demos of Bigtop install, Bigtop build, Bigtop Integration testing in Groovy & Java, writing code using Hadoop, Hive, Flume, Sqoop, HBase, Mahout, and Hama components.  

 

Puppet review once material is ready.

 

The purpose of these working group meetings is to train Java Programmers how to contribute to first Apache Bigtop (incubating) and then other Hadoop ecosystem components.

 

BigTop is a software framework Cloudera open sourced which is used to build, deploy and validate Hadoop distributions (Bigdata stack currently consisting of Hadoop, Hive, Flume, Sqoop, HBase, Mahout into RPM and DEB packages).

This is a good starter project if you are interested in getting hands on programming experience in Hadoop without having to become a Map Reduce or Distributed Computing expert first.

So far we have shown how to do an install, Apache Jira ticket workflow, Jenkins build systems for Hadoop/Cloudera, system/integration test creation and execution against a pseudo-distributed cluster.

Week 1: Installing BigTop(Documentation Complete, Bigtop webpage and pdf/word files).

Week 2: Building Bigtop on VirtualBox or Linux Instance(Documentation Complete, Bigtop Webpage and pdf/word docs).

Week 3: Create a Hadoop integration test based on a simple Mapreduce job and execute it via Bigtop test execution framework. Documentation in progress.

Week 4: Run the labs again on AWS, deploy on AWS using Puppet. Documentation In Progress

Week 5: Repetition of Bigtop Install, build, integration testing on AWS Ubuntu instance.

Week 6: Repetition of Bigtop Install, Build, integration testing on AWS Ubuntu instance, Basic Map Reduce Programming using Eclipse Map Reduce plugin and using Eclipse in AWS Instance. How to run Bigtop integration tests inside Eclipse. Review of DEB files, reverse engineering the Hadoop distribution deb file format vs. bigtop file format.

Victor: Bigtop Integration testing demo in AWS Cloud and Virtual Box instance.

Vijay: Getting Hama to run and basic deb files. 

 

Week 7(1/28/12): Writing Map Reduce code Review, Integration testing and AWS review. 

Week 8:(2/4/12): More Map Reduce Programming and Presentations. Map Reduce is the first widely accepted programming model for commodity PC-grade distributed systems. While many programs will not fit into such a model it is important to develop a proficiency in this programming model for debugging and running programs inside Hadoop Clusters at scale; specifically programs more complex than a merge sort which can exhibit hot spots if the program is not designed correctly. 

 

Week 9(2/11/12): Integration testing, development of Hbase/Hive/Pig backend code. Map Reduce graph algorithms. More Hama DEB file development

 

Week 10(2/25/12): Integration Testing, MR/Hama Test code, Hama/Nutch DEB file development. 

 

Guest lectures to come....

See what progress you can make after the installation of bigtop. Follow the directions on the README and debug.

Biocurious membership required for attendees on the second visit. First visit is free per Biocurious space policy. Membership required on second meet up. Per Biocurious website policy for using the space. This is not a charge collected by this meetup group or any individual, contributor or particpant in this group either in full or any fraction thereof. Please join on the Biocurious website.

http://apachebigtop.p....

Join or login to comment.

  • doug c.

    ppl making progress on projects!!!

    February 26, 2012

  • doug c.

    No unless I know you I don't distribute cluster codes because of past abuses and legal liability. Feel free to apply directly to the MSFT Azure CTP. All code and handouts are available on the wiki address listed at the bottom of the meetup description.

    February 25, 2012

  • Kaniska M.

    Unfortunately I missed today's event.
    Is there any chance I can receive the -' MSFT Hadoop AWS code for free cluster time' . I have setup Ubuntu VM .
    So it will be great if I can get the necessary sample Hadoop code to play with.
    Many thanks for making hadoop environment affordable for programmers.

    February 25, 2012

  • doug c.

    Hey, this is good Bikram.. good work... like the script.. .

    February 21, 2012

  • Ron B.

    here is a blog post on m/r patterns you might find useful http://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/

    February 14, 2012

  • Lei Z.

    I will go. Just FYI, I posted some of my notes at stones333.blogspot.com/ enjoy.

    February 14, 2012

  • Roman V S.

    Guys, don't forget to register and show up at HUG this coming Wednesday: http://www.meetup.com/hadoop/events/36789782/

    Don't mind "no spots left". The venue is big enough and the # of attendees fluctuates quite a bit.

    February 12, 2012

20 went

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy