Los Angeles Hadoop Users Group- LA-HUG Message Board › Questions for Hadoop Community
|A former member||
I'm part of a small group of UCI business grad students and we are looking at the "Hadoop market" - and challenges related to it
I'm hoping to get a discussion going based on the questions below
Questions for Hadoop communities
1) Which do you think is more attractive for Hadoop, commodity HW or customized HW? Why?
2) If customized HW could improve the current Hadoop performance, do you think it would be interesting for users?
|A former member||
Thanks for reaching out to us!
I'd be happy to offer a personal opinion (based on what I've seen).
1) In general, commodity hardware is preferable. Most organizations have commodity servers that have been reclaimed from other business uses (serving up web content, for example) and added to Hadoop clusters to achieve business value by enabling fast computations.
Part of Hadoop's main benefit is that if our computation pipeline isn't fast enough, we can add machines to the cluster on-the-fly to achieve better performance. So if something takes 10 hours on one machine and we can split up the job over five machines so that each worker machine completes its work in 2 hours, we've achieved a 5x speedup. This ability to resize clusters is more significant if we're using commodity hardware, since it's easier to get business approval to re-use existing machines rather than to purchase new hardware.
2) Of course Hadoop users would find it interesting. For instance, Amazon recently started offering specialized elastic mapreduce services, like high-performance computing nodes featuring GPU-enabled processors (which are fast for certain types of problems). http://aws.amazon.com...
That also brings up a different point; many Hadoop users, both individual and corporate, begin using Hadoop on Amazon EC2/EMR, due to the lack of upfront investment. Larger companies tend to invest in building clusters in-house once running a cluster on EC2 becomes too expensive. So many developers may be excited by faster performance of new machines, but won't have access to this hardware unless cloud providers begin offering it.
It may be interesting market research for you to determine what proportion of users use cloud providers (Amazon or otherwise) to run their Hadoop clusters.