June 4, 2012
Been there, done that :-) I have a Ph.D. in computer science and artificial intelligence, have been working mainly with graph data and combinatorial algorithms for last 10 years or so. Familiar with a lot of machine learning, optimization, genetic algorithms, etc. I teach social network analysis and graph theory at George Mason University.
Whatever is appropriate. I've tried 90% of things currently out there at least once, used half a dozen in production. I have learned the hard way that you cannot "shoehorn" data into a pattern that is alien to the nature of the data. Normalized RDBMS, MongoDB, CouchDB, HBase, etc -- all impart a certain desire for what the data should be shaped like and going against that grain is a recipe for pain and suffering. Hadoop is not a panacea, not everything is a good candidate for doing in map/reduce. There's huge value in having well-edited "small data" that answers your exact questions instead of a smorgasbord of "big data" that requires an hour of computation to do the most basic operation. There's even better value in front-loading computation so results can be made available instantly. Programming languages: Python, Java, Clojure are my favorites (about in that order).
Learn more abou tech, get versed in more technologies, see what other people are doing. Also, I'm hiring -- want to meet some rockstar data guys.
Hi -- I'm a data scientist / CTO of DeepMile Networks