Skip to content

2-for-1: Resource Management in Modern Hadoop + Hadoop Application Architecture

Photo of Jason Shao
Hosted By
Jason S.
2-for-1: Resource Management in Modern Hadoop + Hadoop Application Architecture

Details

Start Strata a bit early with 2 talks...

Talk 1: Resource Management in Modern Hadoop Clusters

Abstract:

Modern Hadoop clusters can share resources elastically between multiple frameworks - each tailored to a specific use case. In this talk, we delve into how YARN, Llama and other infrastructure components help achieve this. We elaborate on the technical and operational aspects of a typical cluster that shares a cluster between (1) MapReduce for traditional batch-processing (2) Spark for complex analytics and machine learning, (3) Spark-Streaming for stream-processing, and (4) Impala for interactive SQL.

Bio:

Karthik Kambatla is a Software Engineer at Cloudera, Hadoop Committer, and a PhD student. He works primarily on scheduling and resource management in the Hadoop ecosystem.

Talk 2: Architectural considerations for Hadoop applications

Abstract:

Implementing solutions with Apache Hadoop requires understanding not just Hadoop, but a broad range of related projects in the Hadoop ecosystem such as Hive, Pig, Oozie, Sqoop, and Flume. The good news is that there's an abundance of materials – books, web sites, conferences, etc. – for gaining a deep understanding of Hadoop and these related projects. The bad news is there's still a scarcity of information on how to integrate these components to implement complete solutions. In this session, we'll walk through an concepts related to Hadoop application design being presented in O'Reilly's Hadoop Application Architectures book.

This talk will be valuable for developers, architects, or project leads who are already knowledgeable about Hadoop, and are now looking for more insight into how it can be leveraged to implement real-world applications.

Bio:

Mark Grover is a committer on Apache Bigtop and a committer and PMC member on Apache Sentry (incubating). He has contributed code to Apache Hadoop, Apache Hive, Apache Spark, Apache Pig, Apache Sqoop and Apache Flume. He is a co-author of O'Reilly's Hadoop Application Architectures title and has authored a chapter in O'Reilly's Programming Hive title. He is a software engineer at Cloudera working on integrating various open source technologies in the Hadoop ecosystem.

Photo of New York Hadoop User group group
New York Hadoop User group
See more events
PlaceIQ
115 E 23rd St., 7th Floor · New York, NY