The state of Spark and Hive in the cloud


Details
(This talk with be co-hosted with the Spark group (https://www.meetup.com/Spark-Barcelona/events/241275453/))
Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. The talk compares:
• The performance of both v1 and v2 for Spark and Hive
• PaaS cloud services: Azure HDinsight, Amazon Web Services EMR, Google Cloud Dataproc
• Out-of-the-box support for Spark and Hive versions from providers
• PaaS reliability, scalability, and price-performance of the solutions
Using BigBench, the new Big Data benchmark standard. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.).
Agenda:
18:50 - 19:05 - Arrive and meet other members
19:05 - 19:10 - Group announcements
19:10 - 20:00 - Main talk and Q&A
20:00 - 20:30 - Networking and Beers (TbC)
About the speaker:
Nicolas Poggi(@ni_po) (https://twitter.com/ni_po), is an IT researcher with focus on performance and scalability of Data intensive applications and infrastructures. He is currently leading a research project on upcoming architectures for Big Data at the Barcelona Supercomputing (BSC) and Microsoft Research joint center. Nicolas received his PhD in Distributed Systems and Computer Architecture at UPC/BarcelonaTech, where he is part of the HPC and of the Data Centric Computing research groups. He has also been a Research Scholar at IBM Watson, working in Big Data and system performance topics. Nicolas can usually be found speaking and organizing local IT meetup events.

The state of Spark and Hive in the cloud