Optimising your R code

Name: Optimising your R code
Start: 2019-06-19T17:30:00+10:00
End: 2019-06-19T20:00:00+10:00
Location: Amazon Office MEL11

Hosted By

Yuval M.

Details

In this meetup with have two presentations that will help you optimise your R code so that it runs faster and can cope with more data!

Catering will be provided thanks to our host Amazon, so big thanks!

Rough agenda:
5:30 Networking, food & drinks
6:00 First presentation
6:45 Second presentation
7:30 More networking
8:00 Close

First presentation: You don't need Spark for medium data

Medium data is an important segment of datasets sandwiched between small data (datasets that can be manipulated in R or Python/Pandas) and big data (datasets the require distributing data over many computers to be effective e.g. Hadoop/Spark). This segment is important because it is difficult to analyse without proper tools but is also the predominant form of data in many industries including banking. The canonical tools for dealing with medium data include Dask, JuliaDB.jl, SAS, and Spark; and there aren't any good options in R. In this talk, we will present the disk.frame (https://github.com/xiaodaigh/disk.frame) R package which is a new medium-data manipulation framework that is simple, fast, and (hopefully) intuitive to use. We will showcase how to summarise 1.8 billion data points on a laptop within minutes using disk.frame.

ZJ has more than 10 years of experience in credit risk modelling/analytics/data science and has recently become an independent consultant. He has a maths background, and runs the Sydney Competitive programming meetup and Julia (Julialang) meetup.

Second presentation: Integrating R and C++

Integrating R and C++ is useful when you need to speed up code that runs slowly in R (for example loss functions for time-series models), or when integrating C++ libraries with R. In this talk Slava will show you how it's done!

Slava Razbash has worked in data science roles in multinationals, startups and even a university. He has contributed to the forecast R package. His foremost contribution to the forecast R package is the implementation of the BATS and TBATS models, whose loss functions are written in C++. Slava is the founder of the Enterprise Data Science Architecture Conference (https://edsaconf.io). He is also the organiser of the"Enterprise Data Science Architecture" meetup group (https://www.meetup.com/Enterprise-Data-Science-Architecture/), "AI Engineers of Melbourne" meetup group (http://meetu.ps/c/4hQpS/3wtjv/d) and the "Timeseries Forecasting and Event Analytics" meetup group (http://meetu.ps/c/3VcM8/3wtjv/d).

Events in Melbourne