June 1, 2010
I like the challenge of working with alot of data. With so much data being generated these days there is a tremendous opportunity to mine that data to build interesting products and services. And who doesn't like being able to solve a big problem that seemed too big to consider solving just a few years ago?
Mostly interested in the technologies around Hadoop ( e.g., Hive, Pig, Impala, Drill, SQL on HBase ).
We have used Hadoop at Return Path since November 2008. Since early 2010 we have been in production with Hadoop and over the years have run a number of distributions ( Apache, Cloudera, MapR ). Since the beginning of 2012 we have been happily running a 24 node MapR cluster with about 500TB of storage. We hope to double the size of our cluster by the end of 2012.
We use Pig and Hive. Hive is mostly used by our analytics group for ad-hoc queries. We use pig both for ad-hoc queries, but we also use Pig in our production workflows. We use HBase and are starting to rely on it pretty heavily. We have roughly 20TB of compressed data in HBase right now
My name is Andy Sautins and I have been the CTO at Return Path for the last 14+ years. I am interested in most things technical, but mostly around data and big data.