Adventures in R


Details
Think Big Analytics (https://www.thinkbiganalytics.com/) is hosting our next meetup. At this meetup Think Big's data scientists will take you on some of their adventures in the R language.
There will be food and beverages this time.
Mark the date now.
A more complete program will be announced later.
The venue will be in Copenhagen.
---- CURRENT PROGRAM --------------------------------------------
The current state of naming conventions in R
By Rasmus Bååth
Coming from another programming language one quickly notes that there are many different naming conventions in use in the R community. Looking through packages published on CRAN one will find that functions and variables most often are either `period.separated` or `underscore_separated`, or written in `lowerCamelCase` or `UpperCamelCase`. In 2012 we did a survey of the popularity of different naming conventions used in all the packages on CRAN (Bååth, 2012), but a lot has happened since then! Since 2012 CRAN has more than doubled from 4000 packages to now over 10,000 packages, and we have also seen the rising popularity of the tidyverse packages that often follow the `underscore_separated` naming convention.
In this presentation we will show you the current state of naming conventions used in the R community, we will look at what has happened since 2012 and what the current trend is.
# References
Bååth, R. (2012). The state of naming conventions in R. The R Journal, 4(2), 74-75. https://journal.r-project.org/archive/2012-2/RJournal_2012-2_Baaaath.pdf
R and Spark - using the Sparklyr package to handle big data
By Mikkel Freltoft Krogsholm
R needs to load data into memory before it can perform analysis. This creates a problem if you have more data than your RAM can handle. I will demo how to use the Sparklyr package from Rstudio to do analysis on a data set that is too big to fit in RAM. I am doing the analysis on Think Big's Data Lab platform.
# References
R, Spark and Sparklyr package: http://spark.rstudio.com/
Think Big Data Lab: http://data-lab.io/landingpage/
Predicting output of production process
By Laura Frølich
I will go through code used to compare various models on their ability to predict the amount of product produced in a process using simulated data. I will mention some considerations concerning how data is simulated. We pretend that data is stored in Hive, so we make a Spark connection to retrieve data. Data consists of time series of varying lengths, and we look at how a method called PARAFAC2 can be used to handle this.
Top 10 reasons why Hadley Wickhams Tidyverse is just awesome!
By Niels Ole Dam
A group of R packages known as the Tidyverse is rapidly revolutionising how data scientists all over the world think about their work and how they organise their workflow. In this talk I'll give a subjective introduction to Tidyverse, why it's important and which of it's many features, tips and tricks I think is most useful in my daily work wrangling data. The talk will have less focus on theory and models and more on howto's and on where to start your journey into this wast part of the R universe.
MORE TO COME

Adventures in R