The state of Spark and Hive in the cloud, by Nico Poggi (BSC-Microsoft Research)


Detalles
Hello!
We're excited to announce another Meetup before holidays! This time we join forces with the BDOOP Meetup (https://www.meetup.com/BDOOP-BigData-Operations-On-Perfomance-Barcelona/events/241197127/) to have Nicolas Poggi (http://personals.ac.upc.edu/npoggi/) with us. He's been traveling quite a lot the last months to give talks at Strata London, DataWorks Munich and DataWorks San Jose, but at last we got him here in Barcelona to give us a talk about the recent studies he has done benchmarking Spark in the cloud!
So let's meet the next Thursday 20th of July, 19:00 at Trovit Search. Don't miss it!
Abstract:
Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. The talk compares:
• The performance of both v1 and v2 for Spark and Hive
• PaaS cloud services: Azure HDinsight, Amazon Web Services EMR, Google Cloud Dataproc
• Out-of-the-box support for Spark and Hive versions from providers
• PaaS reliability, scalability, and price-performance of the solutions
Using BigBench, the new Big Data benchmark standard. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.).
Bio:
Nicolas Poggi(@ni_po) (https://twitter.com/ni_po), is an IT researcher with focus on performance and scalability of Data intensive applications and infrastructures. He is currently leading a research project on upcoming architectures for Big Data at the Barcelona Supercomputing (BSC) and Microsoft Research joint center. Nicolas received his PhD in Distributed Systems and Computer Architecture at UPC/BarcelonaTech, where he is part of the HPC and of the Data Centric Computing research groups. He has also been a Research Scholar at IBM Watson, working in Big Data and system performance topics. Nicolas can usually be found speaking and organizing local IT meetup events.

The state of Spark and Hive in the cloud, by Nico Poggi (BSC-Microsoft Research)