Skip to content

SparkR + H2O

Photo of Jordi Torres
Hosted By
Jordi T.
SparkR + H2O

Details

In this special occasion the Barcelona Spark meetup (https://www.meetup.com/es/Spark-Barcelona/), Barcelona R Users Group (https://www.meetup.com/es/RugBcn-Barcelona-R-users-group/) and the Barcelona Machine Learning Study Group (https://www.meetup.com/es/Grup-destudi-de-machine-learning-de-Barcelona/), within the Big Data Week (http://barcelona.bigdataweek.com/) event, organize together a session about the combination of Spark, R and machine learning.

In data management on a large scale we face with a problem of handling data and then treat them fast. For this reason we try to work in a distributed way. In this workshop we introduce the following tools that can help us.

  • SparkR: joins the distributed and robust processing, data sources, off memory data structures from Spark with the dynamic enviroment, interactivity, packages, and visualization tools from R.
  • H20: framework where its speed and flexibility allow users to fit hundreds or thousands of potential models as part of discovering patterns in data.

We recommend bringing you computer with the software installed. Soon, we will send the instructions for the installation.

Jordi Puigdellivol Freixa: Life Learner, Mathematician, Data Scientist & addicted to solving problems. I did a Bachelor's Degree in Mathematics at the Universitat Autònoma deBarcelona (UAB), where I discovered my passion for algorithms and artificial intelligence. I have also worked on business intelligence and as a consultant analyst. I'm currently working at Aia in data analysis for financial applications.

Bartek Skorulski: Lifelong learner, mathematician by education, software developer by hobby and board/card/video game player. He finished his PhD in Mathematics in Dynamical Systems on Warsaw University of Technology. He had been working on various universities for several years and then he decided to try his luck outside the academic enviroment. Now he is working as Data Scientist at King.

Maria José Peláez Montalvo: Data analyst at Schibsted and Mathematician specialized in Numeric Linear Algebra. She finished her undergraduate studies on University Complutense of Madrid and then she did PhD on Carlos III University of Madrid. Then for a long time she had been working on university doing research and teaching. Now she enjoys a lot working with this amazing area that is analysis of data. She likes to play with numbers that have people behind.

For those of you that want to follow the session with your own computer and do the exercises, please intall previously the required packages.

Installation instructions:

SparkR: follow the instructions in the link http://sbartek.github.io/sparkRInstall/installSparkReasyWay.html

H2O: first install the latest version of java, http://www.java.com/es/download/manual.jsp

Then execute the following code in your R (or RStudio) console.

#Uninstall previous version.
if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }
#needed packages!!!!#Curl needed.# needs libcurl-dev for RCurl## sudo apt-get install libcurl14-openssl-dev
if (! ("methods" %in% rownames(installed.packages()))) { install.packages("methods") }if (! ("statmod" %in% rownames(installed.packages()))) { install.packages("statmod") }if (! ("stats" %in% rownames(installed.packages()))) { install.packages("stats") }if (! ("graphics" %in% rownames(installed.packages()))) { install.packages("graphics")}if (! ("RCurl" %in% rownames(installed.packages()))) { install.packages("RCurl") }if (! ("jsonlite" %in% rownames(installed.packages()))) { install.packages("jsonlite")}if (! ("tools" %in% rownames(installed.packages()))) { install.packages("tools") }if (! ("utils" %in% rownames(installed.packages()))) { install.packages("utils") }

#install h2oinstall.packages("h2o", type="source", repos=(c(" http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/3/R ")))

#install h2o.Ensambledevtools::install_github("h2oai/h2o-2/R/ensemble/h2oEnsemble-package")See you there!

Photo of Barcelona Spark Meetup group
Barcelona Spark Meetup
See more events
UPC Campus nord, AULARI A3 - AULA A3002 - anfiteatre
Jordi Girona 1-3, Barcelona · Barcelona