Introduction to RHadoop for R users

This is a past event

21 people went


From the Digbeth High street end of Gibb Street:
Turn left into the first archway (Next to the 80’s sports clothing shop), go through the glass door, walk under the floating bodies and head downstairs. Turn right and you're there!

Digital Native, Room LG04 The Custard Factory, Gibb Street, Birmingham, B9 4AA, Birmingham


Andrie de Vries & Simon Field of Revolution Analytics present a preview of their useR! 2015 tutorial introducing RHadoop. You do not need any knowledge of Hadoop to attend.

This tutorial introduces RHadoop to data scientists new to Hadoop. We do a gentle introduction of terminology, develop the prototypical word count example, and then illustrate distributed computing concepts such as k-means clustering and linear regression on distributed data.

Using Hadoop for big data is a much-hyped technology. Originally developed by companies with web-scale data, Hadoop is increasingly being evaluated by IT departments in many other industries.

The R data scientist must know how to modify algorithms to use of the Hadoop map-reduce paradigm. Fortunately, R has many features of functional languages, for example lapply() which is a simple example of the philosophy of map-reduce. This makes it comparatively easy for an R user to understand the map-reduce idea.

The RHadoop project is an abstraction layer around the Hadoop map-reduce paradigm and HDFS file system, meaning you can focus on writing R code, rather than learning Java.

We plan to provision a Microsoft Azure Hadoop cluster to use on the day. This means you can get stuck in, coding directly using RStudio Server, directly from your web browser (Google Chrome works best).

Seats are limited at 30 & places at 40 (if you can make do without a seat!)

(it also help if you can say whether you're not attending)

The speakers, organiser & anyone else who wants to join will be meeting for food / drinks at Alfie Bird's ( from about 5pm.