Skip to content

Details

Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come pre-configured for general-purpose and ready to use, giving users a quick entry and on-demand deployment of ready SQL-like solutions. This talk evaluates main PaaS services from an end-user perspective using a popular Hive benchmark. Results focus on the performance, readiness, scalability, and price of the different tested providers, including:

• Microsoft Azure HDInsight (HDI)

• Amazon Web Services Elastic Map Reduce (EMR)

• Google Dataproc

• Rackspace Cloud Big Data (CBD)

The talk highlights the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the different cloud providers and compares them to an On-Prem commodity clusters. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.

Agenda:

19:00 - Arrive at Trovit and meet other members

19:15 - Main talk starts

20:00 - Discussion, Beers, and pizzas courtesy of Trovit search

About the speaker:

Nicolas Poggi(@ni_po (http://www.twitter.com/ni_po)), is an IT researcher with focus on performance and scalability of Data intensive applications and infrastructures. He is currently leading a research project on upcoming architectures for Big Data at the Barcelona Supercomputing (BSC) and Microsoft Research joint center. Nicolas received his PhD in Distributed Systems and Computer Architecture at UPC/BarcelonaTech, where he is part of the HPC and of the Data Centric Computing research groups. He has also been a Research Scholar at IBM Watson, working in Big Data and system performance topics. Publications can be found at: http://personals.ac.upc.edu/npoggi/

Related topics

You may also like