Benchmarking Hadoop Workloads
Designing a new Hadoop cluster or optimizing a running one is becoming an art! The high number of hardware and software configuration options such as on-premise vs. on-cloud clusters, SATA array vs. SSDs, distribution, and tuning more than 100 Hadoop parameters can greatly affect executions times, scalability, and of course the Total Cost of Ownership (TCO) of data processing! Benchmarking allows us to learn and plan how an application behaves and scales to the different configurations and loads, and how resources that are utilized in other to make better decisions and optimizations.
This talk will present HiBench (https://github.com/intel-hadoop/HiBench) (https://github.com/intel-hadoop/HiBench), a benchmark suite developed by Intel which includes several ready to use benchmarks of different categories. Where we will see the different resource requirements such as I/O, Memory, or CPU requirements for each workload, and how Hadoop configurations can affect the total running time and TCO of the cluster. Results and insights will be discuss on how to improve the performance and reduce the TCO of clusters.
19:00 - Arrive at Itnig and meet other members
19:15 - Talk: Benchmarking Hadoop Workloads (by Nico Poggi)
19:45 - Q&A and dicussion of topics
20:00 - Networking, pizza and beers
About the presenter:
Nico Poggi (@ni_po (https://twitter.com/ni_po)) is an IT professional with focus on performance and scalability of Web and Data intensive applications. He is currently leading a new research project on upcoming architectures for data processing at the Barcelona Super Computing (BSC) and Microsoft Research joint center ( http://www.bscmsrc.eu/ ). Nicolas received his PhD at the BarcelonTech university (UPC) and combines both a pragmatic approach to performance and scalability with Machine Learning techniques. His publications can be found at: http://personals.ac.upc.edu/npoggi/