Hadoop Cluster Install-athon

*Note: The following meetup are for those who want to get their hands dirty with the technology and need some help. If you are content to watch a presentation on this and assign someone else to install Hadoop, this may not be for you. I can arrange a presentation session of Ambari through a webinar for those interested (message me).

I will guide groups of people through a Hadoop installation. As we only have so much time in one of these sessions, I will be doing an Apache Ambari install of Hadoop. As the automated installer goes through it's work, we will:

- Identify the prerequisites for installation on all (or most) Hadoop clusters

- Inspect the various moving parts

- Identify key configuration files

- Identify key tuning knobs

- Smoke-test the cluster

- See how to manage the cluster post-installation with Ambari (just the basics due to time)

While a manual install of the components would be the most educational, it is simply too time consuming for one of our sessions. While both automated and manual installations occur in production clusters, I think this session will be valuable to understand the common installation points for all Hadoop distributions.

Due to time, I will not be covering the setup of secure Hadoop with Kerberos integration. This is the subject of a future meetup as it will require some time to go over the background and implementation.

As Hortonworks Data Platform (HDP) has Apache Ambari included with it, we will start with that distribution. You could also pull the Apache Ambari and Apache Hadoop projects directly, but it is just more convenient to use the HDP distro. However, for those who wish to do so, you can used the closed-source (but useful) free version of Cloudera Manager (CM) to install CDH on your cluster, the pre-reqs will be the same. I will not cover the CM install in my presentation as it's closed source but I will answer questions and provide help for those who want to use it instead.

To help speed things along, I will be preparing VMware images or Amazon EC2 images. Anyone who has 16Gb+ of memory of their laptop will be able to run the vmware. If you want to be one of the group leaders and if you have the meaty laptop, contact me before the event to get some prereq networking and DNS stuff out of the way. More details to come...

To prepare for this, I have provided some relevant links to documentation below:

HDP Ambari-based install (what we will be doing):

http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.1/bk_using_Ambari_book/content/ambari-chap1.html

HDP Manual-install:

http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.1/bk_installing_manually_book/content/rpm-chap1.html

Manual Apache Hadoop Cluster Setup:

stable:

http://hadoop.apache.org/docs/stable/cluster_setup.html

or the hadoop 2.0.2 Alpha release:

http://hadoop.apache.org/docs/current/

Cloudera Manager (Free version) install:

https://ccp.cloudera.com/display/FREE41DOC/Cloudera+Manager+Free+Edition+Installation+Guide

Cloudera Manual install:

https://ccp.cloudera.com/display/CDH4DOC/CDH4+Installation+Guide

Update! If you want, you can now try installing Hadoop on Windows with HDP:

http://hortonworks.com/thankyou-hdp11-win/

Join or login to comment.

  • Pankaj T.

    Adam, As we spoke after install-athon, I have added detail to configure ssh in part 1 of my blog. http://pthakkar.com/2013/03/installing-hadoop-apache-ambari-amazon-ec2/
    I will continue with more details in part 2.

    March 24, 2013

  • Adam M.

    Here is the link for Amazon's DNS service that was mentioned last night: http://aws.amazon.com/route53/

    March 21, 2013

  • Adam M.

    Folks, I have amassed a cheat sheet / guide for setting up the instances on EC2 and installing Ambari and HDP 1.2.2. There is something to note: last night there was a significant patch uploaded to the HDP repositories moving it from 1.2.1 to 1.2.2. This is what I based the following document on so make sure you start with fresh instances or at least fresh repositories: https://dl.dropbox.com/u/7852876/2013_March_21_EC2_HDP1.2.2_AmbariInstallNotes.pdf

    March 21, 2013

  • Mahan

    It was a great learning experience.
    Thanks Adam & Scalar for giving us the space.
    I would like to see more sessions like this for which there is a lot of interest from the newbies(like me). I believe Adam was overwhelmed with the question he got for the newbies. The more volunteers helping others could have helped. Just an afterthought.

    March 21, 2013

    • Adam M.

      My organized volunteers could not make it that night unfortunately. :(

      March 21, 2013

  • Rajiv A.

    Adam was great with his patience. A very good organizer. Some suggestions for next time would be mentioning prerequisite skills required for each meetup, mandatory homework(pre installation steps or concepts that should be known) and as Mahan said volunteers would help.

    2 · March 21, 2013

    • Adam M.

      The prerequisites for were not well defined. My apologies. I will put something more useful than "time to get your hands dirty" on the meetup description next time.

      March 21, 2013

  • Pankaj T.

    It was really great experience. Thanks Adam and Scalar for it.

    March 21, 2013

  • Edwin C.

    FYI this is the recently released dataset that would be fun to explore... 9TB raw. 568GB compressed...

    http://internetcensus2012.bitbucket.org/download.html

    Might be a good use case to create some simple UDFs?

    March 21, 2013

  • A former member
    A former member

    I enjoyed the session and thanks again to Adam for the pizza and the patience in taking us through the install.

    I thought we were all going to launch EC2 instances from the same pre-configured AMI, then install and fool around with Ambari, so I'd respectfully like to ask why we started with the plain-vanilla AMI and spent most of the time on server configuration.

    I'd love to help create a somewhat pre-configured THUG AMI for future install sessions (and for fiddling with at home) so that we can save time for everyone. Anyone interested?

    1 · March 21, 2013

    • Edwin C.

      I can help out, I have a lot of experience with EC2 and was a former sysadmin a lifetime ago... Good chance to try out cloudformation as well

      March 21, 2013

  • Hardik

    Great

    March 21, 2013

  • Hardik

    Simple Excellent, although I did not get the cluster installation done (not linux savy yet), but it was very good learning experience to get your hands dirty with real hadoop cluster installation, we should continue from where we left and bring it to next level

    As usual heads up to Ad and wonderful to meet great minds in the meetup

    March 21, 2013

  • Raghu S.

    My persistant cough made me to skip the event. Sorry about

    March 20, 2013

  • Adam M.

    March 20, 2013

  • Faisal A.

    Stuck in traffic. What is the best spot to park

    March 20, 2013

    • Michael T

      323 Richmond St. East, at Sherbourne

      March 20, 2013

    • Michael T

      Sorry - google gave me the wrong address. But it's still at Richmond @ Sherbourne

      1 · March 20, 2013

  • Michael T

    If you arrive after 7pm, , please call my cell so I can let you into the building (doors will bee locked). [masked]

    1 · March 20, 2013

  • A former member
    A former member

    Cant wait!

    March 20, 2013

  • Adam M.

    Folks, on the bloor line now but I might be a little late. There will be pizza by 7pm.

    March 20, 2013

  • Mahan

    Adam,
    Apart from AWS which all the installation we would be trying today? I see installation guidelines for HDP, CDH4 and Apache Hadoop.
    When I am trying to Install HDP 1.1 for Windows I am getting the following error...
    http://superuser.com/questions/567196/install-hdp-1-1-got-error

    March 20, 2013

    • Mahan

      I found this link to be very useful.... http://hortonworks.co...­

      March 20, 2013

    • Adam M.

      I will not be covering the Windows installation today in the interest of time. On my way to the meetup now....

      March 20, 2013

  • Faisal A.

    How much disk space should we ensure to have on our laptops for the VMware images?

    March 20, 2013

    • Adam M.

      Unless you have 16Gb to run a few vmware instances, I wouldn't worry about it. :) For the record, my 4 node cluster will use about[masked]GB in total depending on how much data I put on it. I usually use the CentOS[masked]bit minimal for all of my vmware nodes. I will be installing HDP on EC2 tonight though.

      March 20, 2013

    • Faisal A.

      Great thanks!

      March 20, 2013

  • Neil B.

    Sorry, last minute cancel due to a conflict!

    March 20, 2013

  • A former member
    A former member

    Adam, If my laptop doesn't support this config (low mem), can i still benefit from this session?

    March 20, 2013

    • Adam M.

      Just have an amazon aws account setup and you'll be fine. You can buddy up in a group as well

      March 20, 2013

    • A former member
      A former member

      grateful, Thank you.

      March 20, 2013

  • A former member
    A former member

    Hope to be able to make it. :-)

    March 20, 2013

  • David L.

    Sorry to miss the session. Not well.

    March 20, 2013

  • Shi G.

    Sorry, I can't come this time.

    March 20, 2013

  • Raghu S.

    thank you I Will be there!

    March 19, 2013

  • Cas A.

    I'm coming for the first time, so I guess I need more info.
    Can you give me some instructions how to prepare for the session and what are the prerequisites?
    Thanks

    March 19, 2013

  • Cas A.

    Alternatively, I could remotely connect to my server at home. Would that work?
    Cas

    March 19, 2013

    • Edwin C.

      If you can run vmware images at home, you can always connect back to your home server. Most VMWare images can be converted to work on KVM and VirtualBox as well.

      March 19, 2013

    • Adam M.

      Hardik, that's right.

      March 19, 2013

  • David L.

    I do not have access to a laptop. Does it mean I should not attend. Please let me know. This is my 1st meetup like this and I am interested to attend and work with someone if possible.

    March 19, 2013

    • Adam M.

      Just buddy up with someone who is doing the exercise

      March 19, 2013

  • Cas A.

    I did not realize it will be a hands-on session. My notebook is rather small (Mac Air with Win7). Would I benefit from attending?
    Cas

    March 19, 2013

  • Hardik

    Ok, so I am going to have my Win7 laptop with 8GB ram, and presume going to have AMI Linux image, and that's all is required to kick-off install-athon, correct?

    March 19, 2013

  • Edwin C.

    Shouldn't cost much, spot instances are pretty cheap, had good luck getting m2.4xlarge instances for multiple days for $0.14/hour

    March 14, 2013

  • Adam M.

    Guys, I'm likely going to create some base AMI images to work off of. I'll ask Hortonworks to sponsor some amazon gift cards for the cost of running it.

    March 14, 2013

  • Hardik

    Just to set my expectations ahead of time, I have 8gb ram, is it sufficient enough to handle vmware image?

    March 13, 2013

    • Adam M.

      16gb is best. I think we will probably have to have trim services (less daemons).

      March 13, 2013

  • Sunil A.

    I'm attending

    March 6, 2013

  • Tim C.

    In Boston but wish I could make this.

    March 4, 2013

  • Jason C.

    Sadly I'll be out of town.

    February 20, 2013

  • Edwin C.

    For those who haven't used AWS/EC2, now might be a good time to pick it up. Spot pricing for a 8 core 68GB (m2.4xlarge) instance have been pretty steady at $0.14/hour. Easy/cheap way to play with Hadoop.

    February 15, 2013

  • David L.

    I like hands on sessions like this more.

    February 9, 2013

  • A former member
    A former member

    Cant wait!

    February 9, 2013

Our Sponsors

  • IBM

    Meeting facilities, expert speakers, free product, books and education.

  • Big Data University

    Free on-line courses in Hadoop and big data related technologies.

  • Cloudera

    10% off training for Toronto Hadoop User Group members.

  • Hortonworks

    Food, speakers, beverages

  • T4G

    Hosting Meeting locations and providing relevant speakers

  • Paytm Labs

    Paytm Labs offers a venue for the THUG.

People in this
Meetup are also in:

Create your own Meetup Group

Get started Learn more
Henry

I decided to start Reno Motorcycle Riders Group because I wanted to be part of a group of people who enjoyed my passion... I was excited and nervous. Our group has grown by leaps and bounds. I never thought it would be this big.

Henry, started Reno Motorcycle Riders

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy