addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupsimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1outlookpersonJoin Group on CardStartprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Re: [LACloud-Computing] Elastic Filesystems

From: Jordan M.
Sent on: Monday, January 25, 2010 3:02 PM
Darren,

You will likely be disappointed in that nothing perfect exists. A lot of it depends on the application, as several half-@$$ components exist that could potentially be hacked together.
?
HDFS - Hadoop FS

Designed for huge files. Small files are supposed to be slow. Dedicated master with no hot-failover last time I looked. Fuse interface is supposed to be slow. Ask Allen -- he worked with it. Also, appears to be pretty robust, but most users do batch processing on it and not long-term archiving.

Lustre
?
Fastest I have used, but no real replication, so data nodes are a single point of failure. Ideal for fast scratch, but wouldn't trust my data to it long term, unless you do replication outside of Lustre. Has occasional crashes and issues, but pretty robust overall (in a relative sense).
?
Gluster
?
I evaluated it several times and had bad experiences with performance and reliability. Haven't tried it more recently than 6 months-1 year ago, so perhaps things changed. As of last evaluation was the most promising for a mountable/replicated filesystem, but was no where near prime time on large deployments.
?
MogileFS

Used it at TinyTube. Had lots of little issues, but ultimately worked as expected (though had to implement several work-arounds). Not mountable, though not sure if FUSE-extension exists. More of an API to do puts and gets, with a MySQL DB to track metadata and storage nodes that are replicated across. Nice in that files are stored on disks of replica nodes without any special encoding, so worst case scenario, you can copy out the FID to recover.
?
Terrastore
Dokan - FUSE for windows
XtreemFS

No clue. Any feedback?

Cassandra
Project Voldemort

They are supposed to be more key-value based stores than a traditional filesystem. Sort of like HBase or a redundant/persistent Memcache, from what I have heard. One of my co-worked recently looked at them and choice HBase (he was looking for more of a distributed DB), so if you are interested, I can set up a conference call.

Ceph

Last time I spoke to the developer (~1 year ago), he made it very clear that it was not recommended for production use. Not sure if this has changed.

You may also want to do some digging for the file store Caltech developed for LIGO. It looked interesting and if I recall, it was open-source with a similar architecture to MogileFS and written in Python.

Please share your notes as well. It's been about a year since I looked (and chose Lustre), so would like to find out where things are nowadays.

Jordan

Our Sponsors

  • IT Creations

    A hardware vendor with modern, private meeting spaces for our group!

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy