Skip to content

Details

Join us to get the latest scoop on Sqoop, Falcon and learn how customers are doing Hadoop data management and their use cases.

For many enterprises getting to into a data lake can be a big challenge. Part of that challenge is being able to have enterprise grade governance of who is loading or exporting the data and what are they doing with the data.

Apache Falcon (http://falcon.apache.org/) allows an enterprise to process a single massive dataset stored in HDFS in multiple ways—for batch, interactive and streaming applications. With more data and more users of that data, Apache Falcon’s data governance capabilities play a critical role in managing data pipelines at scale. As the value of Hadoop data increases, so does the importance of cleaning that data, preparing it for business intelligence tools, and removing it from the cluster when it outlives its useful life.

The Falcon framework can also leverage other Hadoop components, such as Pig, HDFS, and Oozie (http://oozie.apache.org/). Falcon enables this simplified management by providing a framework to define, deploy, and manage data pipelines.

RDBMS data is another primary data source for the data lake. Apache Sqoop (http://sqoop.apache.org/) is an open source tool to move structured data from an RDBMS to HDFS.

Come to this meetup to learn how customers are managing their data pipelines, learn about about the current state of Falcon and it’s roadmap and learn what’s coming with Sqoop.

Agenda

6:30-7:00 Doors Open: Registration, Welcome & Networking

7:00-7:20 Hadoop data management use case

7:20-8:00 Apache Falcon data management features in 0.9 and demo

8:00-8:20 Talk on Apache Falcon futures

8:20-8:40 Apache Sqoop 2

8:40 pm Review, Close and Thank you for Attending

Members are also interested in