Practical MapReduce Programming | MapRed-a-thon | MapReduce Patterns


Details
Hola a todos!,
After a very successful maiden meetup, it's about time that we have the next meetup - continuing with out promise of a bi-monthly meetup.
First meetup had the theoretical know-how of what MapReduce and Hadoop was and its time to take that knowledge and apply to a set of use-cases. We are now ready for a fully hands-on session on MapReduce programming on some simple but effective learning use-cases.
________________________
IMP: Please fill the below form to attend the meetup as we would have the exact count of the attendees.
http://bit.ly/meetup-form
________________________
The meetup will comprise of the following sections .
• Practical MapReduce programming.
Practical Introduction to MapReduce programming (java and python programming) and explanation of problem statements
• Session on writing simple mapreduce program in Java
• Understanding the various pieces of a MapReduce Program
• Hadoop streaming introduction in Python for non-Java programmers
Time - 1 hour: 10:30 AM to 11:30 AM.
• MapRed-a-thon
A set of questions would be posted for teams to solve based on some standard, open datasets.
• Team distribution, tasks, and explanation of the usecases. We hope that teams can be randomly distributed (randomness is ambiguous ;) ) , in groups of 3-5
• Once a team completes its work, it should also be able to show its approach in a presentation.
• The teams questions would have different levels and the ulimate task of a team is to solve as many usecase it can solve.
• A Hadoop cluster would be in place and different users would be generated for different teams. Even if different users are not created, access to the Hadoop cluster would be provided for the users to submit their Hadoop jobs. The Data files would be already present.
• We would only consider the correctness of the output, and not the performance. Thus, only functional requirements would be looked at.
• This is not a competition, but a workshop to help people get quickly booted up with MR programming. A workshop with individual participants could have been arranged, but a team-based system helps in quicker and a collaborative atmosphere for learning.
Explanation of the Usecases is given below.
Time - 3 hours: 11:30 AM to 02:30 PM.
• Closing Session and Future...
A Q/A session and a discuss on approaches to solve The presentations from various teams about their approach to solve the given problem. This would instigate different ideas.
Time - 1 hour: 02:30 PM to 03:30 PM.
________________________
Laptops | Notebooks | Books | Reference Material
As this is a pure hands-on session, I request everyone to bring a laptop along. Yes, a group may finally use a single machine to develop there solution, but it helps to have multiple people look at various resources (ebooks, search engines, blogs, mailing lists, etc.).
________________________
Use-cases
The use-cases would be a set of SQL statements on some datasets. SQL is one of the most studies and used languages and is understood by everyone, and even the part of curricullum of most of the college courses. Thus implementing MapReduce programs for various SQL statements becomes an ideal use-case for someone to learn and understand how Hadoop breaks the data and parallelizes computation, and also helps someone with only the knowledge of SQL to understand the paradigm well.
The questions would be in levels and from various datasets. The datasets would explained and the teams will be given some records to play around with data.
Note: The dataset would be structured and would be primarily an easy-to-understand dataset, keeping the wide variety of backgrounds in mind.
Level 1 - SELECT COUNT(*) FROM table
Level 2 - SELECT column FROM table
Level 3 - SELECT * FROM table WHERE ?
And subsequent levels.
...
We encourage you to solve as many levels as possible. It would be even better if you solve any use-case which your team finds interesting :)
________________________
We will be uploading the proposed solutions to a github account dedicated to this meetup and engourage you to share the code you write. This will help for future reference for both the writer of the code as well as the team members to refer in future.
Hope to see you in this exciting second edition of the BigData meetup. I'd like to thank all the sponsors for helping us make BigData meetup bigger and better.
Let's make some BigNoise.
¡Gracias!Varad
IMP:Please fill the below form to attend the meetup as we would have the exact count of the attendees.
Some reference for you to boot up. :)
Some Reference:
[1] Eclipse Setup for Hadoop Development (http://www.orzota.com/eclipse-setup-for-hadoop-development/)
[2] Step-by-step MapReduce Programming (http://www.orzota.com/step-by-step-mapreduce-programming/)
[3] Hands-on MapReduce Programming (http://www.slideshare.net/orzota/handson-mapreduce-programming)
[4] Writing an Hadoop MapReduce Program in Python (http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/)

Practical MapReduce Programming | MapRed-a-thon | MapReduce Patterns