Are you looking for a deeper understanding of how to integrate various tools in the Hadoop ecosystem to implement data management and processing solutions? This talk will discuss how to implement an end-to-end solution with Hadoop through an example case study, along with providing best practices and recommendations for using these tools.
Although there's currently a large amount of material on how to use Hadoop and related components in the Hadoop ecosystem, there's a scarcity of information on how to optimally tie those components together into complete applications, as well as how to integrate Hadoop with existing data management systems – a requirement to make Hadoop truly useful in enterprise environments.
This talk will help you put the pieces together through a real-world example architecture, implementing an end-to-end application using Hadoop and components in the Hadoop ecosystem. This example will be used to illustrate important topics such as:
• Modeling data in Hadoop and selecting optimal storage formats for data stored in Hadoop
• Moving data between Hadoop and external systems such as relational databases and logs
• Accessing and processing data in Hadoop
• Orchestrating and scheduling workflows on Hadoop
Throughout the example, best practices and considerations for architecting applications on Hadoop will be covered.
Mark Grover is a committer on Apache Bigtop, a committer and PMC member on Apache Sentry (incubating) and a contributor to Apache Hadoop, Apache Hive, Apache, Spark, Apache Sqoop and Apache Flume. He is currently co-authoring O'Reilly's Hadoop Application Architectures title and is a section author of O’Reilly’s book on Apache Hive - Programming Hive. He has written guest blog posts and presented at many conferences about technologies in the hadoop ecosystem. He currently works as a software engineer at Cloudera.