*Note: The following meetup are for those who want to get their hands dirty with the technology and need some help. If you are content to watch a presentation on this and assign someone else to install Hadoop, this may not be for you. I can arrange a presentation session of Ambari through a webinar for those interested (message me).
I will guide groups of people through a Hadoop installation. As we only have so much time in one of these sessions, I will be doing an Apache Ambari install of Hadoop. As the automated installer goes through it's work, we will:
- Identify the prerequisites for installation on all (or most) Hadoop clusters
- Inspect the various moving parts
- Identify key configuration files
- Identify key tuning knobs
- Smoke-test the cluster
- See how to manage the cluster post-installation with Ambari (just the basics due to time)
While a manual install of the components would be the most educational, it is simply too time consuming for one of our sessions. While both automated and manual installations occur in production clusters, I think this session will be valuable to understand the common installation points for all Hadoop distributions.
Due to time, I will not be covering the setup of secure Hadoop with Kerberos integration. This is the subject of a future meetup as it will require some time to go over the background and implementation.
As Hortonworks Data Platform (HDP) has Apache Ambari included with it, we will start with that distribution. You could also pull the Apache Ambari and Apache Hadoop projects directly, but it is just more convenient to use the HDP distro. However, for those who wish to do so, you can used the closed-source (but useful) free version of Cloudera Manager (CM) to install CDH on your cluster, the pre-reqs will be the same. I will not cover the CM install in my presentation as it's closed source but I will answer questions and provide help for those who want to use it instead.
To help speed things along, I will be preparing VMware images or Amazon EC2 images. Anyone who has 16Gb+ of memory of their laptop will be able to run the vmware. If you want to be one of the group leaders and if you have the meaty laptop, contact me before the event to get some prereq networking and DNS stuff out of the way. More details to come...
To prepare for this, I have provided some relevant links to documentation below:
HDP Ambari-based install (what we will be doing):
Manual Apache Hadoop Cluster Setup:
or the hadoop 2.0.2 Alpha release:
Cloudera Manager (Free version) install:
Cloudera Manual install:
Update! If you want, you can now try installing Hadoop on Windows with HDP: