San Francisco Hadoop Users Message Board › Mar 2011 - schema and metadata management

Mar 2011 - schema and metadata management

A former member
Post #: 8
* Evolvable schemas add complexity to data processing, but are necessary
* Good idea for file-based data: use convention to decide where the schema goes. (e.g., have "foo.schema" to correspond to "foo.txt") Have a file just named ".schema" that describes the schema that applies to all files in the directory.
* HBase: Need to store a (pointer to the) schema for each cell, if cells can evolve independently.
** Each data cell has a companion schema pointer cell, that provides some id (MD5, counter, etc) that references the schema
** Another table / column family holds the actual JSON schema text itself (for Avro), rows are referenced by that id.
** HAvroBase provides functionality built in roughly this fashion, for storing Avro-encoded data in HBase
Powered by mvnForum

Our Sponsors

  • Cloudera

    Cloudera is the organizer of this meetup.

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy