Update - we will have a surprise guest joining us: Reza Zadeh.
Also, I will be collecting for the Daily Bread Food Bank. If you have RSVP'ed then please bring a non-perishable food item.
Reza Zadeh is a Consulting Professor of Computational Mathematics at Stanford, and technical Advisor at Databricks. He focuses on Discrete Applied Mathematics, Machine Learning Theory and Applications, and Large-Scale Distributed Computing. More information available on his website: http://stanford.edu/~rezab/
We are lucky enough to have another Apache Bigtop contributor, Apache PPMC, and Apache Sentry committer, Mark Grover, visit us in December. We have limited seats so please only RSVP if you intend to come.
Mark Grover is a committer on Apache Bigtop, a committer and PMC member on Apache Sentry (incubating) and a contributor to
Apache Hadoop, Apache Spark, Apache Hive, Apache Sqoop, Apache Pig and Apache Flume. He is currently co-authoring O’Reilly’s Hadoop Application Architectures title and is a section author of O’Reilly’s book on Apache Hive – Programming Hive. He has written a few guest blog posts and presented at many conferences about technologies in the hadoop ecosystem.
What we will cover:
Implementing solutions with Apache Hadoop requires understanding not just Hadoop, but a broad range of related projects in the Hadoop ecosystem such as Hive, Pig, Oozie, Sqoop, and Flume. The good news is that there’s an abundance of materials – books, web sites, conferences, etc. – for gaining a deep understanding of Hadoop and these related projects. The bad news is there’s still a scarcity of information on how to integrate these components to implement complete solutions. In this tutorial we’ll walk through an end-to-end case study of a clickstream analytics engine to provide a concrete example of how to architect and implement a complete solution with Hadoop. We’ll use this example to illustrate important topics such as:
• Modeling data in Hadoop and selecting optimal storage formats for data stored in Hadoop
• Moving data between Hadoop and external data management systems such as relational databases
• Moving event-based data such as logs and machine generated data into Hadoop
• Accessing and processing data in Hadoop
• Orchestrating and scheduling workflows on Hadoop
Throughout the example, best practices and considerations for architecting applications on Hadoop will be covered. This talk will be valuable for developers, architects, or project leads who are already knowledgeable about Hadoop, and are now looking for more insight into how it can be leveraged to implement real-world applications.