Skip to content

Cleaning up the data cleaning process + predicting Danish election outcomes

Photo of Claus Ekstrøm
Hosted By
Claus E.
Cleaning up the data cleaning process + predicting Danish election outcomes

Details

Here's a little Christmas present: We're rebooting meetings in the CopenhagenR useRs group in the new year. We'll start with two very nice talks:

First talk by Anne Helby Petersen:
Cleaning up the data cleaning process with the dataMaid package

Data cleaning and data validation are the first steps in practically any data analysis, as the validity of the conclusions from the analysis hinges on the quality of the input data. Mistakes in the data can arise for any number of reasons, including erroneous codings, malfunctioning measurement equipment, and inconsistent data generation manuals. However, data cleaning is in itself often a messy endeavor with little structure, direction or documentation – and worst of all: it is both tedious and time consuming. I will present an R package, dataMaid, that may not make the process less dull, but hopefully a lot quicker. We wrote the dataMaid package in order to 1) spend more time on data analysis (fun), less time on data validation (boring) by automating some of the validation steps that come up most often; 2) help document the data at all the different stages of the cleaning process; 3) make it easy to produce a document that non R-savvy collaborators can read, understand and use to decide “do these data look right?”. The dataMaid package includes both very user friendly one-liner commands that auto-generates data overview reports, as well as a highly customizable suite of data validation and documentation tools that can be molded to fit most data validation needs. And, perhaps most importantly, it was specifically build to make sure that documentation and validation go hand in hand, so we can clean up the mess that is an unstructured data cleaning process. Isn’t that neat?

Second talk by Mikkel Krogsholm:
And the winner of the next Danish election is …

2019 is around the corner and that means that it is election season in Denmark. In this talk I will play around with Danish polling data and show you how to predict who will be Denmark's next Prime minister.

I will discuss some methods used to create poll of polls in order to make more robust forecastings and different approaches to estimating uncertainty in polls.

---

We currently have no sponsors for food and drink so if you know of anyone to sponsor a bunch of pizzas and drinks then let us know.

Photo of CopenhagenR - useR Group group
CopenhagenR - useR Group
See more events
University of Copenhagen, CSS, room 1.1.18
Øster Farimagsgade 5B · Copenhagen, al