addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

SparkR + H2O

In this special occasion the Barcelona Spark meetupBarcelona R Users Group and the Barcelona Machine Learning Study Group, within the Big Data Week event,  organize together a session about the combination of Spark, R and machine learning.

In data management on a large scale we face with a problem of handling data and then treat them fast. For this reason we try to work in a distributed way. In this workshop we introduce the following tools that can help us.

- SparkR: joins the distributed and robust processing, data sources, off memory data structures from Spark with the dynamic enviroment, interactivity, packages, and visualization tools from R. 
- H20: framework where its speed and flexibility allow users to fit hundreds or thousands of potential models as part of discovering patterns in data. 

We recommend bringing you computer with the software installed. Soon, we will send the instructions for the installation.

Jordi Puigdellivol Freixa: Life Learner, Mathematician, Data Scientist & addicted to solving problems. I did a Bachelor's Degree in Mathematics at the Universitat Autònoma deBarcelona (UAB), where I discovered my passion for algorithms and artificial intelligence. I have also worked on business intelligence and as a consultant analyst. I'm currently working at Aia in data analysis for financial applications.  

Bartek Skorulski: Lifelong learner, mathematician by education, software developer by hobby and board/card/video game player. He finished his PhD in Mathematics in Dynamical Systems on Warsaw University of Technology. He had been working on various universities for several years and then he decided to try his luck outside the academic enviroment. Now he is working as Data Scientist at King.


Maria José Peláez Montalvo: Data analyst at Schibsted and Mathematician specialized in Numeric Linear Algebra. She finished her undergraduate studies on University Complutense of Madrid and then she did PhD on Carlos III University of Madrid. Then for a long time she had been working on university doing research and teaching. Now she enjoys a lot working with this amazing area that is analysis of data. She likes to play with numbers that have people behind.


Installation instructions:

For those of you that want to follow the session with your own computer and do the exercises, please intall previously the required packages.

SparkR: follow the instructions in the link http://sbartek.github.io/sparkRInstall/installSparkReasyWay.html

H2O: first install the latest version of java, http://www.java.com/es/download/manual.jsp

Then execute the following code in your R (or RStudio) console.

#Uninstall previous version.
if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }
#needed packages!!!!#Curl needed.# needs libcurl-dev for RCurl## sudo apt-get install libcurl14-openssl-dev
if (! ("methods"  %in% rownames(installed.packages()))) { install.packages("methods") }if (! ("statmod"  %in% rownames(installed.packages()))) { install.packages("statmod") }if (! ("stats"    %in% rownames(installed.packages()))) { install.packages("stats")   }if (! ("graphics" %in% rownames(installed.packages()))) { install.packages("graphics")}if (! ("RCurl"    %in% rownames(installed.packages()))) { install.packages("RCurl")   }if (! ("jsonlite" %in% rownames(installed.packages()))) { install.packages("jsonlite")}if (! ("tools"    %in% rownames(installed.packages()))) { install.packages("tools")   }if (! ("utils"    %in% rownames(installed.packages()))) { install.packages("utils")   }

#install h2oinstall.packages("h2o", type="source", repos=(c("http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/3/R")))

#install h2o.Ensambledevtools::install_github("h2oai/h2o-2/R/ensemble/h2oEnsemble-package")


Únete o inicia sesión para comentar.

  • Diego

    Muchas gracias Maria José y Bartek por el tutorial tan interesante y detallado sobre SparkR. Habéis dejado muy claro su potencial y sus límites actuales. Y gracias Jordi por la presentación de H2O, una potente herramienta que tendré en cuenta en futuros proyectos :-)

    1 · 26 de noviembre de 2015

  • Elisenda

    Lástima que hayáis cambiado la hora. ¿Pondréis los recursos del meetup en algún sitio?

    25 de noviembre de 2015

  • Aleix Ruiz De V.

    You can find the SparkR presentation in
    https://github.com/CotePelaez/SimpleTutorialSparkR

    2 · 25 de noviembre de 2015

  • Xavier

    algun otro sitio donde esten las insturcciones?
    Estas no me han funcionado...

    24 de noviembre de 2015

    • Xavier

      Me encuentro fuera, pero primero R dice que necesito un JDK de java, que pedirá instalación, y no se arregla. Tengo última versión de Java y El Capitán.
      De ahí no he pasado.

      24 de noviembre de 2015

    • Aleix Ruiz De V.

      pero se queja h2o o sparkr?

      1 · 24 de noviembre de 2015

Nuestros patrocinadores

Los miembros de este grupo también
son parte de:

Registrarse

Miembros de Meeetup, Inicien sesión

Al hacer clic en «Registrarse» o «Registrarse con Facebook», confirmas que aceptas nuestras Condiciones de uso y la Política de privacidad