Skip to content

The state of Spark and Hive in the cloud

Photo of Nico Poggi
Hosted By
Nico P.
The state of Spark and Hive in the cloud

Details

(This talk with be co-hosted with the Spark group (https://www.meetup.com/Spark-Barcelona/events/241275453/))

Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. The talk compares:

• The performance of both v1 and v2 for Spark and Hive

• PaaS cloud services: Azure HDinsight, Amazon Web Services EMR, Google Cloud Dataproc

• Out-of-the-box support for Spark and Hive versions from providers

• PaaS reliability, scalability, and price-performance of the solutions

Using BigBench, the new Big Data benchmark standard. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.).

Agenda:

18:50 - 19:05 - Arrive and meet other members

19:05 - 19:10 - Group announcements

19:10 - 20:00 - Main talk and Q&A

20:00 - 20:30 - Networking and Beers (TbC)

About the speaker:

Nicolas Poggi(@ni_po) (https://twitter.com/ni_po), is an IT researcher with focus on performance and scalability of Data intensive applications and infrastructures. He is currently leading a research project on upcoming architectures for Big Data at the Barcelona Supercomputing (BSC) and Microsoft Research joint center. Nicolas received his PhD in Distributed Systems and Computer Architecture at UPC/BarcelonaTech, where he is part of the HPC and of the Data Centric Computing research groups. He has also been a Research Scholar at IBM Watson, working in Big Data and system performance topics. Nicolas can usually be found speaking and organizing local IT meetup events.

Photo of Big Data Operations On Performance (BDOOP) group
Big Data Operations On Performance (BDOOP)
See more events