addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1light-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

New Year's R-esolution: Clean Data With R!

There's a lot of value in data out there - but there's a lot of nonsense too. Knowing is half the battle!

In this talk, Aaron Schumacher will show you how to use R to interrogate your data and whip it into shape so that you can get good results. We'll iteratively load, explore, merge, graph, and use assertive code to keep our data honest.

In preparation for the talk, please be sure you have installed R: http://watson.nci.nih.gov/cran_mirror/ and RStudio: http://www.rstudio.com/ide/download/desktop If you've never used R, you could warm up with a free web tutorial on the basics: http://tryr.codeschool.com/

We'll get you up and running with everything else you need once you arrive!

About Aaron

Aaron Schumacher is a data scientist interested in understanding the world. He has helped people learn to use R, Python, and other tools at NYU and General Assembly. He likes visualization, machine learning, and breakdancing.

Join or login to comment.

  • Nevin H.

    A boring guy like me actually enjoys such a discussion, well done!

    January 13, 2014

  • Aaron S.

    Hi all! I just put up some commented code:
    https://github.com/ajschumacher/clean_data_with_R/blob/master/commented_code.R
    It looks like the problem with checking the rounding of percentages was a floating point issue that I hadn't noticed while preparing - yet another insidious problem when doing analysis!
    More could certainly be done for this example data set - if you want to add more to the repo, demonstrate techniques that you like, etc., that would certainly be cool!

    January 12, 2014

  • Harlan H.

    And for those who haven't seen it, the "wat" talk about Ruby and JavaScript: https://www.destroyallsoftware.com/talks/wat

    January 9, 2014

  • Chad A.

    Great stuff! Thanks for sharing Aaron. And thanks to Robert and ARPC for hosting.

    1 · January 9, 2014

  • A former member
    A former member

    Aaron was a fountain of hands-on information.

    January 9, 2014

  • A former member
    A former member

    Lots of great info, jobs available and practical tools. Thanks!

    January 9, 2014

  • Aaron S.

    Thanks everyone! This was a lot of fun, and everybody was really great - I hope the community will continue to grow and share more ideas and techniques!

    The "talk" part of the talk is all up on my blog (probably more eloquently than I delivered it, since I wrote it off-stage) with the mostly graphic slides as well:

    http://planspace.org/2014/01/07/clean-data-with-r/

    January 9, 2014

    • Aaron S.

      That also has a link to the github repo where the data is, and where I also just put up the complete command history of everything I ran in RStudio during the talk - not because I think my live-coding is a model of what finished R code should look like, but so that folks can look at it while it's fresh and perhaps gain by understanding what it's doing where it works and why it's failing where it fails. I want to clean up my code notes and comment things a little better and then I'll add that as well some time in the next few days (realistically this weekend).

      Thanks all! Looking forward to more data wrangling in the future! :)

      January 9, 2014

  • Neil C

    Will there be time at the end for questions? And by questions, I mean breakdancing.

    January 8, 2014

    • A former member
      A former member

      Absolutely! Especially break dancing.

      January 8, 2014

  • Domenico

    I am having trouble downloading the data. Do I need to join Github? I am receiving the message that the data is too big to display. Is there code to download?

    January 8, 2014

    • Aaron S.

      Oh - and I don't have code posted on the repo yet; just the data for now!

      January 8, 2014

    • Mahesh P.

      when you go to the link: https://github.com/ajs...­ , on the right menu there is an option to download all the files as Zip. Search for "Download Zip"

      January 8, 2014

  • Aaron S.

    Hi! The main data set that we'll work with tomorrow is available here:

    https://github.com/ajschumacher/clean_data_with_R

    I think GW's wifi may not be really public, so if you want to try and follow along in the final segment tomorrow, you'll want to download this Excel file before you come!

    1 · January 7, 2014

    • Mahesh P.

      Are there any non-standard R packages that we need to download ?

      January 8, 2014

    • Aaron S.

      I may mention a couple, but I'm planning to keep all the session coding in base R.

      1 · January 8, 2014

  • A former member
    A former member

    Just picked up a few hundred cookies for tomorrow night!

    January 7, 2014

    • Domenico

      What is the best way to get there if I am usin the metro?

      January 7, 2014

    • Mahesh P.

      Get down at Foggy Bottom, 3 minutes walk from there

      January 7, 2014

  • Michael B.

    Sorry, just joined and cannot make the meeting.

    January 7, 2014

    • A former member
      A former member

      Next time!

      January 7, 2014

  • A former member
    A former member

    Sadly, my Wednesday nights are booked until March (evening school), so I'll be missing the next 2 :(

    December 5, 2013

    • A former member
      A former member

      We'll be recording all the meetups Jennifer, so you'll be covered.

      December 5, 2013

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy