Data scientists struggle to find the right tools when it comes to processing Big Data. Wouldn’t it be nice if one could continue to use the familiar laptop based tools, such as R, to analyze Big Data?
In this talk, Jorge will introduce Distributed R, an extension to R for Big Data processing. Distributed R enables large scale machine learning, statistical analysis, and graph processing by splitting tasks across multiple cores and machines in a cluster. As a result, Distributed R is much faster than regular R and can handle much larger workloads. Data scientists can continue using their familiar R environment, benefit from a number of out-of-the box parallel algorithms, and even write their custom parallel applications.
Jorge will use the Kaggle March Madness dataset as an example to show how Distributed R is used to solve real life machine learning problems.
Bio: Jorge Martinez is part of HP Vertica engineering team and works on the HP Distributed R product. His interests are distributed systems and machine learning