• Data R for Kids

    University of Copenhagen, CSS, room 1.1.18

    Exciting talk and co-creation event by Sine Zambach: Data R for Kids ============== We know from tobacco industry, that we should get’em while they are young. But if we should get young people using in R – what would it require? To a start [someone] should develop a new set of exercises that can engage kids and young people to explore data science and (R)-programming. Not sports results or airplane time tables. Exercises should be simple and funny and relate to the everyday of the young students. This event is a co-creation, and I will only entertain briefly. After this, we are all in center in designing and perhaps taking the first steps in having a library of R-exercises for kids.

    1
  • Smooth curves in R and Xaringan shenanigans

    University of Copenhagen, CSS, room 1.1.18

    Two exciting upcoming talks: Smooth curves in R by Niels Lundtorp Olsen =========================================== R is a great and flexible software but lacks an easy and fast package for dealing curves such as in functional data and vector graphics. This talk will not be about all the cool tools for making plots, but we will go one step behind and look at how smooth curves are represented as data in a computer, and how to implement this in R. Doing things fast in R often means doing it in C++, which also will bring us past object-oriented programming. In the end, we get a package that can easily be applied to functional data and elsewhere. This is work in progress and feedback will be appreciated. Xaringan shenanigans by Claus Thorn Ekstrøm ============================================= R provides several possibilities for using Rmarkdown to quickly create excellent presentations combining R code, R output, mathematics, animations, and interactive widgets. In this code-along I will show how to use the package xaringan to build simple, elegant, Rmarkdown-driven presentations based on the remark.js presentation framework. We will cover formatting, widgets, caveats, show some tips and tricks, and how to customise your own presentation template.

    2
  • Bayesian Statistics in R / Full-stack data science in R

    University of Copenhagen, CSS, room 1.1.18

    Two exciting talks: Bayesian Statistics in R ========================================== by Jonas Lindeløv, Assistant Professor in Cognitive Neuroscience and Neuropsychology, Aalborg University This workshop will give a conceptual and practical introduction to Bayesian statistics in R. Bayesian statistics have a long been known to provide a larger flexibility than other approaches but it is only in recent years that it has become easy to apply this flexibility in practice. In this talk I will discuss Bayes Factors for model comparisons (as an alternative to p values) and Utility Theory as an approach for decision making. The presentation will be based notebooks referenced below but does not require that these have been studied before the talk. https://lindeloev.github.io/utility-theory/ https://rpubs.com/lindeloev/bayes_factors R at scale on the Google Cloud Platform ====================================== by Michał Burdukiewicz: bioinformatician affiliated with Warsaw University of Technology, founder of the Why R? Foundation and Wrocław R Users Group (STWUR), CEO of .prot. Data science requires more than just sufficient statistical knowledge to create a model. Data, often obtained from different sources, must be purified, combined and unified, dry analysis results visualized and the model itself made available in a form accessible to the client. The R environment provides tools to support every stage of this process: from data collection through model development to the development of web applications. During my talk, I will present package necessary the full stack, large-scale data science projects in R: drake, mlr and shinyproxy.

    3
  • Easy peasy massive parallel computing / R at scale on the Google Cloud Platform

    University of Copenhagen, CSS, room 1.1.18

    Two exciting talks: Easy peasy massive parallel computing in R ========================================== by Mikkel Krogsholm Wouldn’t it be nice to be able to write simple R-code that very simply scales to massive parallel computing? The future and the furrr package in R provides a framework that makes it possible for you to write code, that works seamlessly on your laptop or on a supercomputer. With these, R expressions can be evaluated on the local machine, in parallel a set of local machines, or distributed on a mix of local and remote machines. There is no need to modify any code in order switch from sequential on the local machine to distributed processing on a remote compute cluster. Global variables and functions are also automatically identified and exported as needed, making it straightforward to tweak existing code to make use of futures. This R-talk shows you how. We will run through a concrete example that we first execute on a local machine and then on a much more powerful server. R at scale on the Google Cloud Platform ====================================== by Mark Edmonson This talk covers my current thinking on what I consider the optimal way to work with R on the Google Cloud Platform (GCP). It seems this has developed into my niche, and I get questions about it so would like to be able to point to a URL. Both R and the GCP rapidly evolve, so this will have to be updated I guess at some point in the future, but even as things stand now you can do some wonderful things with R, and can multiply those out to potentially billions of users with GCP. The only limit is ambition. The common scenarios I want to cover are: Scaling a standalone R script and Scaling Shiny apps and R APIs

    2
  • Introduction to Artificial Neural Networks in R using Keras/TensorFlow

    University of Copenhagen, CSS, room 1.1.18

    Introduction This hands-on workshop will be given by Leon Eyrich Jessen. Leon is an assistant professor of bioinformatics in the immunoinformatics and machine learning group, at the section of bioinformatics at DTU Health Tech. At the workshop, you will be introduced to some of the underlying theory of artificial neural networks (ANNs), how an ANN model is created and we will discuss some of the pitfalls. Finally, you will get to implement, train, tune and apply an ANN based predictor. Background In order for a technology to have a societal impact, it needs to be accessible. The Deep Learning hype we are currently experiencing is partially due to the release of open source computational frameworks like TensorFlow. Deep Learning has numerous applications for complex pattern recognition within virtually all branches of industries, e.g. customer churn, self-driving cars, cancer diagnostics, market price forecasting, molecular interactions, etc. The aim of this workshop is to empower you to use the TF technology. The backbone of the Deep Learning revolution is artificial neural networks (ANNs). Historically, ANNs were available to those with the skills to implement ANN algorithms or to compile existing code. TensorFlow, a general computational framework by google, was only available in Python. However, in 2018 at rstudio::conf in San Diego, RStudio CEO JJ Allaire announced that henceforth Keras and TF will by fully supported in R. JJ Allaire’s presentation included examples of invited blog post from the official RStudio website, one of which was written by Leon. Level: Beginner Prerequisites: Workshop participants should bring their own laptop, with the latest versions of R, RStudio and the R-packages ‘tidyverse’ and ‘keras’ installed. Alternatively, a free cloud account is available at https://rstudio.cloud. Note that the event is a 3 hours code-along feast!

    16
  • Cleaning up the data cleaning process + predicting Danish election outcomes

    University of Copenhagen, CSS, room 1.1.18

    Here's a little Christmas present: We're rebooting meetings in the CopenhagenR useRs group in the new year. We'll start with two very nice talks: First talk by *Anne Helby Petersen*: **Cleaning up the data cleaning process with the dataMaid package** Data cleaning and data validation are the first steps in practically any data analysis, as the validity of the conclusions from the analysis hinges on the quality of the input data. Mistakes in the data can arise for any number of reasons, including erroneous codings, malfunctioning measurement equipment, and inconsistent data generation manuals. However, data cleaning is in itself often a messy endeavor with little structure, direction or documentation – and worst of all: it is both tedious and time consuming. I will present an R package, dataMaid, that may not make the process less dull, but hopefully a lot quicker. We wrote the dataMaid package in order to 1) spend more time on data analysis (fun), less time on data validation (boring) by automating some of the validation steps that come up most often; 2) help document the data at all the different stages of the cleaning process; 3) make it easy to produce a document that non R-savvy collaborators can read, understand and use to decide “do these data look right?”. The dataMaid package includes both very user friendly one-liner commands that auto-generates data overview reports, as well as a highly customizable suite of data validation and documentation tools that can be molded to fit most data validation needs. And, perhaps most importantly, it was specifically build to make sure that documentation and validation go hand in hand, so we can clean up the mess that is an unstructured data cleaning process. Isn’t that neat? Second talk by *Mikkel Krogsholm*: **And the winner of the next Danish election is …** 2019 is around the corner and that means that it is election season in Denmark. In this talk I will play around with Danish polling data and show you how to predict who will be Denmark's next Prime minister. I will discuss some methods used to create poll of polls in order to make more robust forecastings and different approaches to estimating uncertainty in polls. --- We currently have no sponsors for food and drink so if you know of anyone to sponsor a bunch of pizzas and drinks then let us know.

    2
  • Multi-state Churn Analysis + Unf*ck your code

    ITU, IT University, Auditorium 3

    • What we'll do Title: Multi-state churn analysis with a subscription product Subscriptions are no longer just for newspapers. The consumer product landscape, particularly among e-commerce firms, includes a bevy of subscription-based business models. Internet and mobile phone subscriptions are now commonplace and joining the ranks are dietary supplements, meals, clothing, cosmetics and personal grooming products. Standard metrics to diagnose a healthy consumer-brand relationship typically include customer purchase frequency and ultimately, retention of the customer demonstrated by regular purchases. If a brand notices that a customer isn’t purchasing, it may consider targeting the customer with discount offers or deploying a tailored messaging campaign in the hope that the customer will return and not “churn”. The churn diagnosis, however, becomes more complicated for subscription-based products, many of which offer multiple delivery frequencies and the ability to pause a subscription. Brands with subscription-based products need to have some reliable measure of churn propensity so they can further isolate the factors that lead to churn and preemptively identify at-risk customers. During the presentation I’ll show how to analyze churn propensity for products with multiple states, such as different subscription cadences or a paused subscription. If the time allows I’ll also present useful plots that provide deep insights during such modeling, that we have developed at Gradient Metrics - a quantitative marketing agency (http://gradientmetrics.com/). Bio: Marcin Kosiński has a master degree in Mathematical Statistics and Data Analysis specialty. Community events host: co-organizer of the +1600 members R Enthusiasts meetups in Warsaw and the main organizer of the Polish R Users Conference 2017 (‘Why R? 2017’ whyr.pl). Interested in R packages development and survival analysis models. Currently explores and improves methods for quantitative marketing analyses and global surveys at Gradient Metrics. Title: Unf*ck Your code This talk is about how to unf*ck your code. And by unf*ucking I mean making sure that it works every time, under every condition and is written in a way that makes sense to you and to others. Because if it doesn’t, then your code is f*cked. If you are a researcher then it means doing reproducible research. If you work in business it means writing production ready code. And if you are just writing code alone in the dark it means writing code your future self will understand. This talk is about coding styles, comments, documentation, packaging, tests and docker. This talk aims a making good programmers out of good data scientists. Bio: if(Mikkel && R){ message("Amazing!") } News: - R-Ladies are in Copenhagen now! Check out their group: https://www.meetup.com/rladies-copenhagen • What to bring • Important to know

    6
  • Adventures in R

    ITU, IT University of Copenhagen, Auditorium 2

    Think Big Analytics (https://www.thinkbiganalytics.com/) is hosting our next meetup. At this meetup Think Big's data scientists will take you on some of their adventures in the R language. There will be food and beverages this time. Mark the date now. A more complete program will be announced later. The venue will be in Copenhagen. ---- CURRENT PROGRAM -------------------------------------------- The current state of naming conventions in R By Rasmus Bååth Coming from another programming language one quickly notes that there are many different naming conventions in use in the *R* community. Looking through packages published on CRAN one will find that functions and variables most often are either `period.separated` or `underscore_separated`, or written in `lowerCamelCase` or `UpperCamelCase`. In 2012 we did a survey of the popularity of different naming conventions used in all the packages on CRAN (Bååth, 2012), but a lot has happened since then! Since 2012 CRAN has more than doubled from 4000 packages to now over 10,000 packages, and we have also seen the rising popularity of the **tidyverse** packages that often follow the `underscore_separated` naming convention. In this presentation we will show you the current state of naming conventions used in the R community, we will look at what has happened since 2012 and what the current trend is. # References Bååth, R. (2012). The state of naming conventions in R. *The R Journal*, 4(2), 74-75. [https://journal.r-project.org/archive/2012-2/RJournal_2012-2_Baaaath.pdf](https://journal.r-project.org/archive/2012-2/RJournal_2012-2_Baaaath.pdf) R and Spark - using the Sparklyr package to handle big data By Mikkel Freltoft Krogsholm R needs to load data into memory before it can perform analysis. This creates a problem if you have more data than your RAM can handle. I will demo how to use the Sparklyr package from Rstudio to do analysis on a data set that is too big to fit in RAM. I am doing the analysis on Think Big's Data Lab platform. # References R, Spark and Sparklyr package: http://spark.rstudio.com/ Think Big Data Lab: http://data-lab.io/landingpage/ Predicting output of production process By Laura Frølich I will go through code used to compare various models on their ability to predict the amount of product produced in a process using simulated data. I will mention some considerations concerning how data is simulated. We pretend that data is stored in Hive, so we make a Spark connection to retrieve data. Data consists of time series of varying lengths, and we look at how a method called PARAFAC2 can be used to handle this. Top 10 reasons why Hadley Wickhams Tidyverse is just awesome! By Niels Ole Dam A group of R packages known as the Tidyverse is rapidly revolutionising how data scientists all over the world think about their work and how they organise their workflow. In this talk I'll give a subjective introduction to Tidyverse, why it's important and which of it's many features, tips and tricks I think is most useful in my daily work wrangling data. The talk will have less focus on theory and models and more on howto's and on where to start your journey into this wast part of the R universe. MORE TO COME

    9
  • R as software delevopment - SKAT (Danish Tax Authorities)

    SKAT (Danish Tax Authorities) will host our next meetup. SKAT does some really cool things with R and have set up an awesome software development environment around R. So come join the meetup if you want to learn about writing R code from development to production. Program: 17:30 – 17:40 Welcome 17:40 – 17:55 Data Science in SKAT (Laurits Søgaard Nielsen) 17:55 – 18:10 SKAT Continuous Delivery R Platform (Rasmus Moseholm) 18:10 – 18:25 Case: Matching foreign data to Danish tax payers (Kari Gunnarsson) Break 18:40 – 18:55 Statistical Models for Real Estate Value Assessment in ICE (Emil Anker Jørgensen) 18:55 – 19:10 ICE Interactive Shiny App for the Real Estate Model (Kristian Stendorff Nielsen) 19:10 – 19:25 Using Docker to Evaluate the Real Estate Model (Jacob Engelbrecht)

    19
  • Data Driven Stories

    Advice

    We have a wonderful program set up for next week, where we will dive into the world of data driven stories. There will be pizza - compliments of our gracious hosts - Advice. Remember: please un-sign-up if you can't make it. We have a long waiting list. Talks: Creating a Data Driven Story Rasmus Kernn-Jespersen Rasmus is a journalist at Bias - a new web media with a fact and data-focused approach that tries to qualify the political, cultural and social debate in Denmark. Rasmus will talk about how he goes about creating a story using data. Developments in data journalism Tommy Birch Kaas, Kaas & Mulvad Tommy Kaas has since 2007 been part owner of the data journalism-business Kaas & Mulvad. He gives an overview of developments in data journalism in Denmark, shows examples of his own work - and gives examples of sources of inspiration. Analyzing media using R Emil Lykke Jensen, MediaLytic MediaLytic analyzes what is written in the media - both in traditional editorial media, as well as social media. The analysis are based on large amounts of data, which which is processed by advanced algorithms and run through a number of different analytical models. The results provide new insights and provides an overview of the same brand in different media. And all of this is done through R with a dash of Shiny.

    4