New Year's R-esolution: Clean Data With R!

There's a lot of value in data out there - but there's a lot of nonsense too. Knowing is half the battle!

In this talk, Aaron Schumacher will show you how to use R to interrogate your data and whip it into shape so that you can get good results. We'll iteratively load, explore, merge, graph, and use assertive code to keep our data honest.

In preparation for the talk, please be sure you have installed R: http://watson.nci.nih.gov/cran_mirror/ and RStudio: http://www.rstudio.com/ide/download/desktop If you've never used R, you could warm up with a free web tutorial on the basics: http://tryr.codeschool.com/

We'll get you up and running with everything else you need once you arrive!

About Aaron

Aaron Schumacher is a data scientist interested in understanding the world. He has helped people learn to use R, Python, and other tools at NYU and General Assembly. He likes visualization, machine learning, and breakdancing.

Join or login to comment.

  • Nevin H.

    A boring guy like me actually enjoys such a discussion, well done!

    January 13, 2014

  • A former member
    A former member

    Hi all! I just put up some commented code:
    https://github.com/ajschumacher/clean_data_with_R/blob/master/commented_code.R
    It looks like the problem with checking the rounding of percentages was a floating point issue that I hadn't noticed while preparing - yet another insidious problem when doing analysis!
    More could certainly be done for this example data set - if you want to add more to the repo, demonstrate techniques that you like, etc., that would certainly be cool!

    January 12, 2014

  • Harlan H.

    And for those who haven't seen it, the "wat" talk about Ruby and JavaScript: https://www.destroyallsoftware.com/talks/wat

    1 · January 9, 2014

  • Chad A.

    Great stuff! Thanks for sharing Aaron. And thanks to Robert and ARPC for hosting.

    2 · January 9, 2014

  • Robert D.

    Aaron was a fountain of hands-on information.

    January 9, 2014

  • Janeen

    Lots of great info, jobs available and practical tools. Thanks!

    1 · January 9, 2014

  • A former member
    A former member

    Thanks everyone! This was a lot of fun, and everybody was really great - I hope the community will continue to grow and share more ideas and techniques!

    The "talk" part of the talk is all up on my blog (probably more eloquently than I delivered it, since I wrote it off-stage) with the mostly graphic slides as well:

    http://planspace.org/2014/01/07/clean-data-with-r/

    1 · January 9, 2014

    • A former member
      A former member

      That also has a link to the github repo where the data is, and where I also just put up the complete command history of everything I ran in RStudio during the talk - not because I think my live-coding is a model of what finished R code should look like, but so that folks can look at it while it's fresh and perhaps gain by understanding what it's doing where it works and why it's failing where it fails. I want to clean up my code notes and comment things a little better and then I'll add that as well some time in the next few days (realistically this weekend).

      Thanks all! Looking forward to more data wrangling in the future! :)

      1 · January 9, 2014

  • Neil C

    Will there be time at the end for questions? And by questions, I mean breakdancing.

    1 · January 8, 2014

    • Robert D.

      Absolutely! Especially break dancing.

      January 8, 2014

  • Domenico

    I am having trouble downloading the data. Do I need to join Github? I am receiving the message that the data is too big to display. Is there code to download?

    January 8, 2014

    • A former member
      A former member

      Oh - and I don't have code posted on the repo yet; just the data for now!

      January 8, 2014

    • Mahesh P.

      when you go to the link: https://github.com/ajs...­ , on the right menu there is an option to download all the files as Zip. Search for "Download Zip"

      January 8, 2014

  • A former member
    A former member

    Hi! The main data set that we'll work with tomorrow is available here:

    https://github.com/ajschumacher/clean_data_with_R

    I think GW's wifi may not be really public, so if you want to try and follow along in the final segment tomorrow, you'll want to download this Excel file before you come!

    2 · January 7, 2014

    • Mahesh P.

      Are there any non-standard R packages that we need to download ?

      January 8, 2014

    • A former member
      A former member

      I may mention a couple, but I'm planning to keep all the session coding in base R.

      1 · January 8, 2014

  • Josh P.

    Hello all, I may not be able to attend tonight but I hear rumors that the talk will be recorded. Where will I be able to find access to that?

    January 8, 2014

    • Robert D.

      Hi Josh. Unfortunately the talk tonight won't be recorded, however I'm sure Aaron will post his slides online. His code is already on GitHub.

      January 8, 2014

  • Robert D.

    Just picked up a few hundred cookies for tomorrow night!

    January 7, 2014

    • Domenico

      What is the best way to get there if I am usin the metro?

      January 7, 2014

    • Mahesh P.

      Get down at Foggy Bottom, 3 minutes walk from there

      1 · January 7, 2014

  • Michael B.

    Sorry, just joined and cannot make the meeting.

    January 7, 2014

  • Jennifer A S.

    Sadly, my Wednesday nights are booked until March (evening school), so I'll be missing the next 2 :(

    December 5, 2013

    • Robert D.

      We'll be recording all the meetups Jennifer, so you'll be covered.

      December 5, 2013

Our Sponsors

  • ARPC

    Economic, financial, statistical, analytics and operational consulting.

  • Data Community DC

    DWDC is a proud member meetup of the DC2 community.

  • Statistics.com

    15% off EVERYTHING with code "DC2"

  • Cloudera

    Organizational sponsor of Data Community DC!

People in this
Meetup are also in:

Create your own Meetup Group

Get started Learn more
Rafaël

We just grab a coffee and speak French. Some people have been coming every week for months... it creates a kind of warmth to the group.

Rafaël, started French Conversation Group

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy