As Halloween approaches, take some time out of your hectic candy-buying schedule to learn about data tidying! 🎃
Papers We Love is an international organization centered around the appreciation of computer science research papers. There's so much we can learn from the landmark research that shaped the field and the current studies that are shaping our future. Our goal is to create a community of tech professionals passionate about learning and sharing knowledge. Come join us!
New to research papers? Watch The Refreshingly Rewarding Realm of Research Papers by Sean Cribbs.
Ideas and suggestions are welcome–fill our our interest survey here and let us know what motivates you!
// Tentative Schedule
• 7:00-7:30–Networking + informal paper discussion
• 7:30-7:35–Introduction and announcements
• 7:45-8:45–Tidy Data, presented by Andrew Breza
• 8:40-9:00–Food and informal paper discussion
CustomInk Cafe (3rd Floor)
Mosaic District, 2910 District Ave #300
Fairfax, VA 22031
When you get here you can come in via the patio. Don't be scared by the metal gate and sign. It's accessible via the outside stairs near True Food. There is a parking garage next door for those coming by vehicle. And, there is a walkway to the patio on the 3rd floor of the garage nearest moms organic market.
Due to SafeTrack, we recommend for anyone taking the metro to this event to consider the 2A bus from the East Falls Church metro station. It will circumvent the worst of the scheduled delays. The bus will stop at Gallows Rd and Lee Hwy., a few minutes walk from the CustomInk office. For more information, read here.
The Dunn Loring metro station is about 0.7 miles from our meetup location. It’s very walkable, but if you’d prefer a bus, the 402 Southbound and 1A/1B/1C Westbound leave from Dunn Loring Station about every 5-10 minutes (see a schedule for more detailed timetable).
If you're late, we totally understand–please still come! (via the patio is best) Just be sure to slip in quietly if a speaker is presenting.
- Tidy Data by Hadley WIckham
homepage | pdf
Abstract: "A huge amount of effort is spent cleaning data to get it ready for analysis, but there has been little research on how to make data cleaning as easy and effective as possible. This paper tackles a small, but important, component of data cleaning: data tidying. Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. This framework makes it easy to tidy messy datasets because only a small set of tools are needed to deal with a wide range of un-tidy datasets. This structure also makes it easier to develop tidy tools for data analysis, tools that both input and output tidy datasets. The advantages of a consistent data structure and matching tools are demonstrated with a case study free from mundane data manipulation chores."
About Dr. Hadley Wickham: Hadley Wickham is a statistician from New Zealand who is currently Chief Scientist at RStudio and an adjunct Assistant Professor of statistics at Rice University. He is a prominent and active member of the R user community and has developed several notable and widely used packages including ggplot2, plyr, dplyr, and reshape2. He was named a Fellow by the American Statistical Association in 2015 for "pivotal contributions to statistical practice through innovative and pioneering research in statistical graphics and computing" (from Wikipedia).