New York Open Statistical Programming Meetup Message Board › Reproducible Research: saving indexable versions of (model, data, result) in
New York, NY
When building models, we often go through many stages of analysis from studying of relationships between candidate predictors and the response, variable selections and fitting many different models to see which individual models or which ensemble performs best. In the process, we often need to keep around many versions of data, code, and model objects and some custom results/ metrics of evaluation.
I've looked at some of the post earlier and specifically I thought jobman from deeplearning.net seems to do this in some direction. But has anyone tried to organize their versions of (model, data, result) into a NoSQL database (for example, as key-value stores). I use R mostly for analysis and also python. But would prefer tools that are generic enough to accommodate both.
The idea of organizing in some sort of key-value store is for look-up purposes in the future. Say you want to compare some graphs of a particular evaluated metric across all your models.
Seems like some combinations of git, NoSQL will do the job.. Remember I'd like to keep versions of (model, data, result) tuple. Since my dataset might change as well (e.g. it could have some errors I didn't catch before) so I'd like to record all histories
Please kindly advise me which NoSQL database is more suitable for reproducible research for storing all versions of model, data and results in an indexable format.
It is not good enough to store as a blob of information since I would like to query for example which model has the best result in a given metrics, or which model used a particular predictor.