Data Science on H2O Sparkling Water + Apache Spark Performance Tuning


Details
Sparkling Water allows users to combine the fast, scalable machine learning algorithms of H2O with the capabilities of Spark. With Sparkling Water, users can drive computation from Scala/R/Python and utilize the H2O Flow UI, providing an ideal machine learning platform for application developers.
Come and join us for another awesome spark meetup where we will Todd Niven from Telstra is going to talk about H2O Sparkling water and how it makes data science adoption on spark is much easier.
The talk will be given by Maksud Ibrahimov, Chief Data Scientist at InfoReady Analytics. He is going to share with us how to maximise the performance of Spark.
As a user of Apache Spark from very early releases, he generally sees that the framework is easy to start with but as the program grows its performance starts to suffer. In this talk Maksud will answer the following questions:
- How to reach higher level of parallelism of your jobs without scaling up your cluster?
- Understanding shuffles, and how to avoid disk spills
- How to identify task stragglers and data skews?
- How to identify Spark bottlenecks?
As usual Pizza/sandwiches and beer of course are provided.
See you all there :)

Data Science on H2O Sparkling Water + Apache Spark Performance Tuning