Skip to content

Apache Bigtop Working Meeting -- Hadoop M/R coding and porting Hama to Bigtop

Photo of doug chang
Hosted By
doug c.
Apache Bigtop Working Meeting -- Hadoop M/R coding and porting Hama to Bigtop

Details

Review of Labs 1, 2, 3 for new members on 2/25/12

The IntelliJ licenses have been distributed to registered members as of 1/27...

This is a working group meeting for Java Programmers interested in becoming Apache Bigtop Committers with corporate support/donations from MSFT, Amazon, Jetbrains, and Cloudera. The focus is on learning how to write cloud code using Hadoop, Hive, Flume, Sqoop, HBase, Mahout and Hama.

Members on the confirmed sign up list have been sent MSFT Hadoop Azure codes and AWS codes as of 1/11/12 for free cluster time, sponsored by MSFT and Amazon.

Roman from Cloudera available for questions.

This is not a class. This is a working group meeting where you can see what others have done, code they have written, and how other programmers go about deconstructing complicated pieces of s/w. This is a self paced format where each member works at their own pace. The material presented here is equivalent to what you would be getting if you joined as a new employee at Cloudera. Members show Demos of Bigtop install, Bigtop build, Bigtop Integration testing in Groovy & Java, writing code using Hadoop, Hive, Flume, Sqoop, HBase, Mahout, and Hama components.

Puppet review once material is ready.

The purpose of these working group meetings is to train Java Programmers how to contribute to first Apache Bigtop (incubating) and then other Hadoop ecosystem components.

BigTop is a software framework Cloudera open sourced which is used to build, deploy and validate Hadoop distributions (Bigdata stack currently consisting of Hadoop, Hive, Flume, Sqoop, HBase, Mahout into RPM and DEB packages).

This is a good starter project if you are interested in getting hands on programming experience in Hadoop without having to become a Map Reduce or Distributed Computing expert first.

So far we have shown how to do an install, Apache Jira ticket workflow, Jenkins build systems for Hadoop/Cloudera, system/integration test creation and execution against a pseudo-distributed cluster.

Week 1: Installing BigTop(Documentation Complete, Bigtop webpage and pdf/word files).

Week 2: Building Bigtop on VirtualBox or Linux Instance(Documentation Complete, Bigtop Webpage and pdf/word docs).

Week 3: Create a Hadoop integration test based on a simple Mapreduce job and execute it via Bigtop test execution framework. Documentation in progress.

Week 4: Run the labs again on AWS, deploy on AWS using Puppet. Documentation In Progress

Week 5: Repetition of Bigtop Install, build, integration testing on AWS Ubuntu instance.

Week 6: Repetition of Bigtop Install, Build, integration testing on AWS Ubuntu instance, Basic Map Reduce Programming using Eclipse Map Reduce plugin and using Eclipse in AWS Instance. How to run Bigtop integration tests inside Eclipse. Review of DEB files, reverse engineering the Hadoop distribution deb file format vs. bigtop file format.

Victor: Bigtop Integration testing demo in AWS Cloud and Virtual Box instance.

Vijay: Getting Hama to run and basic deb files.

Week 7(1/28/12): Writing Map Reduce code Review, Integration testing and AWS review.

Week 8:(2/4/12): More Map Reduce Programming and Presentations. Map Reduce is the first widely accepted programming model for commodity PC-grade distributed systems. While many programs will not fit into such a model it is important to develop a proficiency in this programming model for debugging and running programs inside Hadoop Clusters at scale; specifically programs more complex than a merge sort which can exhibit hot spots if the program is not designed correctly.

Week 9(2/11/12): Integration testing, development of Hbase/Hive/Pig backend code. Map Reduce graph algorithms. More Hama DEB file development

Week 10(2/25/12): Integration Testing, MR/Hama Test code, Hama/Nutch DEB file development.

Guest lectures to come....

See what progress you can make after the installation of bigtop. Follow the directions on the README and debug.

Biocurious membership required for attendees on the second visit. First visit is free per Biocurious space policy. Membership required on second meet up. Per Biocurious website policy for using the space. This is not a charge collected by this meetup group or any individual, contributor or particpant in this group either in full or any fraction thereof. Please join on the Biocurious website.

http://apachebigtop.p.... (http://apachebigtop.pbworks.com/w/page/48434924/FrontPage)

Photo of Silicon Valley Hands On Programming Events group
Silicon Valley Hands On Programming Events
See more events
Ground Floor Silicon Valley
2030 Duane Avenue · Santa Clara , CA