align-toparrow-leftarrow-rightbackbellblockcalendarcamerachatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-crosscrosseditfacebookglobegoogleimagesinstagramlocation-pinmagnifying-glassmailmoremuplabelShape 3 + Rectangle 1outlookpersonplusImported LayersImported LayersImported Layersshieldstartwitteryahoo

Re: [hug-uk] Hadoop for Startups

From: Dan H.
Sent on: Friday, May 18, 2012 6:57 PM

On Fri, May 18, 2012 at 1:23 PM, Alex McLintock <[address removed]> wrote:
Hi Folks, 

Although I attend HUGUK meetups in London sometimes I want to chat more about the practical issues that we face running Hadoop in the UK.

In particular I want to learn from people about using Hadoop in startups. 

One thing I want to start pushing more now the group is getting bigger is more smaller meetups on specific things like this. With I can set it up so anyone can create an event and it become a happening meetup if enough people are interested. I think that might be useful for people meeting on small bits of Hadoop like this, will ask people next Tuesday to see what we think. We'll still be doing the monthly more structured ones too.
Right now I am trying to come up with reasonable cost estimates and it is rather hard without the first hand experience. Does anyone have any costs they are willing to share?

I am thinking that Amazon S3/EMR is best for low initial outlay, but I appreciate it wont be the cheapest in the long term. 
But I am also trying to think about how many people will be needed - and what they will cost. It looks like I should be seeking a couple of good Java developers able to pick up Hadoop and Hive. 

This all very much depends on what you are doing, how much data, how frequently it gets processed, and what the cpu load is. So I'm not sure how related costing of other projects would be?

With costing AWS I tend to prototype/develop the work I'm doing before fully costing it, then when it moves to production I can find what re-occurring costs are give how long it takes. EMR is great for this as you tend to make chunk of work that can be costed as a whole unit. Generally EMR costs are way above the S3 costs so you can care less about storage during development, and only needing the clusters up whilst you're working also saves costs a lot.

A useful spreadsheet I've made to see how EMR cost relates to the resources and map/reduce slots you get here

Note I've not filled out all instance types and I may not keep it updated with changes.

Give me a shout if you want to chat in person, or in private. 

Maybe we can come up with enough information to present at a future HUGUK event. 

Alex McLintock
alexmc6 skype and twitter. 

Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
This message was sent by Alex McLintock ([address removed]) from Hadoop Users Group UK.
To learn more about Alex McLintock, visit his/her member profile
Set my mailing list to email me As they are sent | In one daily email | Don't send me mailing list messages

Meetup, PO Box 4668 #37895 New York, New York[masked] | [address removed]

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy