Exploring longitudinal data and Production-Ready R model using AWS

Melbourne Users of R Network (MelbURN)
Melbourne Users of R Network (MelbURN)
Public group
Location image of event venue


Two separate topics for this meetup;
Exploring Longitudinal data with the brolgar package
Productionising models in AWS using SageMaker

Thanks again to Servian for venue and catering!

Rough agenda:
5:45: Networking, food & drinks
6:15 First presentation
7:00 Second presentation
7:45 More networking
8:30 Close

First presentation: Making better spaghetti (plots): Exploring the individuals in longitudinal data with the brolgar package

Longitudinal (panel) data provide the opportunity to examine temporal patterns of individuals, because measurements are collected on the same person at different, and often irregular, time points. The data is typically visualised using a "spaghetti plot", where a line plot is drawn for each individual. When overlaid in one plot, it can have the appearance of a bowl of spaghetti. With even a small number of subjects, these plots are too overloaded to be read easily. The interesting aspects of individual differences are lost in the noise.

Longitudinal data is often modeled with a hierarchical linear model to capture the overall trends, and variation among individuals, while accounting for various levels of dependence. However, these models can be difficult to fit, and can miss unusual individual patterns. Better visual tools can help to diagnose longitudinal models, and better capture the individual experiences.

In this talk, Nick will introduce the R package, **brolgar** (BRowse over Longitudinal data Graphically and Analytically in R), which provides tools to identify and summarise interesting individual patterns in longitudinal data.

Dr. Nick Tierney completed his PhD in Statistics at QUT and is now a Lecturer at Monash University. Nick's research aims to improve data analysis workflow. This includes statistical modelling, calculating diagnostics, drawing inferences and making decisions. Crucial to this work is producing high quality software to accompany each research idea. His work so far has focussed on the importance of knowing your data ( in the R package, visdat ), and on creating principles and tools that make it easier to work with, explore, and model missing data (in the R package, naniar). Nick has also created an optimisation model that identifies and possibly relocates facilities to maximise their coverage on a population, in the R package maxcovr. Nick loves the R programming language and how it has transformed his world. He is a proud member of the rOpenSci community, a collective that works to make science open using R, and co-host a podcast with Dr. Saskia Freytag about everything #rstats named “Credibly Curious”.

Second presentation: Building Scalable Production-Ready R model using AWS

In this growing world of Machine Learning, we often roam around creating POCs (proof-of-concepts), but an important aspect that is often forgotten is how we bring models from POC and into production. The thing is, productionising your model and ensuring scalability is not as easy as a click of a button or minor code change. Even after you get your model into production, we must re-evaluate our model, perform A/B testing and redeploy our model. This is all part of the data science pipeline. All of these steps are time-consuming and require many configurational changes.

In this presentation, Jeno will present how everyday data scientists can take the model they have built-in R, productionise it with the use of Docker and AWS SageMaker at ease. This will cover the training, deployment, A/B testing of the model you have built using R and some aspects of AWS and Docker.

As a Consultant at Servian, Jeno has been helping various businesses build scalable data and cloud solutions in the areas of AI/ML, Data Warehousing, Data Migration, Serverless, and Containers. He holds a Master of Data Science degree from Monash University and is a lover for R. When he is not busy roaming around Medium on the latest tech and analysing and modeling in R, he competes in Hackathons and Datathons around Melbourne.