The unsexy part of data science: data munging

Real world data is usually dirty, messy, full of errors. If you want to get reasonable results from your statistical modeling, you'll need to explore the data, clean it up, transform it, prepare it for modeling. Data scientists commonly spend 80% of their time doing this tedious task of data munging. But how to do it? 

In this meetup we'll have 5 short 10-15 min talks on data munging. We'll then open for Q&A where we'll address further issues.

Talks:

1. Szilard Pafka: Intro and overview

2. Yasmin Lucero: Munging date-times in R: tools, tricks, gotchas 

3. Daniel Gutierrez: Data Munging: the Good, the Bad and the Ugly 

4. Neal Fultz: Tidy Data, Facts & Rules for R

5. Eric Klusman:  Plyr for split-apply-combine

Timeline:

- 6:00pm food/drinks and networking

- 7:00pm talks starts promptly

Please arrive by 6:55pm the latest.

Please RSVP as places are limited.

Venue: General Assemblyhttp://generalassemb.ly ) will kindly host this meetup. There is no parking provided. Q ( http://www.qconnects.com/ ) will kindly sponsor/provide the food and drinks.

Join or login to comment.

  • Matti S

    I wanted also to invite everyone here to get a few R topics lined up for SCALE 12x ( Feb 21, 22, 23 ). The call for presentations is open and I would love to see more R and other open source tech talks. https://www.socallinuxexpo.org/blog/scale-12x

    November 14, 2013

    • Matti S

      If you have any questions please do ask me at a meetup or here. Thanks! - Matti

      November 14, 2013

    • Tim Triche, J.

      Great to meet you at BDC LA today. I foresee something on dplyr, ggvis, or jvmr for SCALE 12. Should be fun.

      November 16, 2013

  • James C.

    Good meetup! I liked the short presentations, though some of the speakers were a bit rushed by the time constraint. It might be fun to have a mix of short, topical presentations and longer, in-depth presentations.

    Thanks again to General Assembly for the great venue and to Q for the catering!

    November 16, 2013

  • Daniel G.

    I thought I'd write a piece about all my friends at the LA R User Group: http://inside-bigdata.com/2013/11/15/unsexy-part-data-science-data-munging/

    1 · November 15, 2013

  • Jeff W.

    Great presentations, going at the subject from multiple perspectives and with multiple tools and packages. thanks to the Presenters

    November 15, 2013

  • Andrew D.

    Great! I learned some things from everyone. Especially, now I have a concept of what plyr does for you. Thanks for not locking the doors at 7PM-- rte 60 was closed, causing a massive traffic jam in the San Gabriel Valley--in addition to the usual traffic. I would have hated to have had to turn around and go home.
    Andy Dagis

    November 15, 2013

  • Boniface

    Good stuff

    November 15, 2013

  • - Szilard Pafka -

    We'll raffle 5 O'Reilly books.

    I wrote (for the previous raffle) a few lines of R to randomly select the winners. Here is the R code: https://gist.github.com/szilard/7036195 (you'll need to update the event_id).

    I just ran it and got the winners, see coming list. We'll go down this list (by skipping those not present at the meetup) and #1 will choose a book, then #2 will choose from the remaining books etc. until all the books are gone.

    Szilard

    November 14, 2013

    • - Szilard Pafka -

      The list:
      1 Eduardo Arino de la Rubia
      2 Ray A
      3 Jane Carlen
      4 Szilard Pafka
      5 Tuan H. Nguyen
      6 Alexandria Luostari
      7 Ketung
      8 Sofia Suo
      9 Kiet Nguyen
      10 Jason Kuan
      11 Tilly Wang
      12 Jason DeVita
      13 Matthew
      14 Wayne Smith
      15 Anirudh Ranganath Narasimhan
      16 Sam
      17 Daniel Gutierrez
      18 Alex Nano
      19 Yves
      20 Shannon Callan

      November 14, 2013

    • - Szilard Pafka -

      We went down the list to #14 getting the last book ;)

      November 14, 2013

  • Yasmin L.

    Here are my slides for tonight: http://rpubs.com/yolio/10562

    3 · November 14, 2013

  • Jane C.

    Sadly can't make it after all. The good news is that someone else moved up in the book raffle (I was third).

    November 14, 2013

  • David

    Sadly I wont be able to make it...

    November 14, 2013

  • Anthony J.

    will there be a recording, if I can't make it in time? Thanks!

    November 14, 2013

  • Raghuram S.

    Unexpected conflict, will miss it :( Please post link if any recording...

    November 14, 2013

  • Eric R.

    I have to miss it. :( I was looking forward to this one, too.

    November 14, 2013

  • - Szilard Pafka -

    In order for us to plan for food/drinks, please revise your RSVPs and change it to No if you are not coming.

    November 12, 2013

  • Mark

    I had to go out of town for business. Enjoy!

    November 12, 2013

  • Raghuram S.

    Yes!

    November 11, 2013

  • Angelica Zavala L.

    Great topic for my first meeting! Looking forward to it.

    1 · November 7, 2013

  • Leela Krishna K

    Yes

    November 3, 2013

  • A former member
    A former member

    This is so interesting. Great topic. Thanks Szilard.

    October 1, 2013

  • Jeff W.

    Great idea for the meetup. I'm new to all this and have heard this number in a number of places.

    October 1, 2013

  • Esteban

    I did not know it was common to spend 80% of our time doing this. Good to know I'm not alone.

    October 1, 2013

Our Sponsors

People in this
Meetup are also in:

Imagine having a community behind you

Get started Learn more
Rafaël

We just grab a coffee and speak French. Some people have been coming every week for months... it creates a kind of warmth to the group.

Rafaël, started French Conversation Group

Start your Meetup today

Act now and get 50% off.
Until February 1.

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy