Real world data is usually dirty, messy, full of errors. If you want to get reasonable results from your statistical modeling, you'll need to explore the data, clean it up, transform it, prepare it for modeling. Data scientists commonly spend 80% of their time doing this tedious task of data munging. But how to do it?
In this meetup we'll have 5 short 10-15 min talks on data munging. We'll then open for Q&A where we'll address further issues.
1. Szilard Pafka: Intro and overview
2. Yasmin Lucero: Munging date-times in R: tools, tricks, gotchas
3. Daniel Gutierrez: Data Munging: the Good, the Bad and the Ugly
4. Neal Fultz: Tidy Data, Facts & Rules for R
5. Eric Klusman: Plyr for split-apply-combine
- 6:00pm food/drinks and networking
- 7:00pm talks starts promptly
Please arrive by 6:55pm the latest.
Please RSVP as places are limited.
Venue: General Assembly ( http://generalassemb.ly ) will kindly host this meetup. There is no parking provided. Q ( http://www.qconnects.com/ ) will kindly sponsor/provide the food and drinks.