Intro to Apache Hive and HDFS Erasure Coding


Details
Schedule:
18:00 Doors Open, Sandwiches & Refreshments
18:30 Opening notes and announcements
18:40 Adam Szita, Peter Vary: When it hurts to get rid of data
19:00 Q&A
19:10 Andrew Wang: Introduction to HDFS Erasure Coding
19:30 Q&A
19:45 Open Discussion, Beers, Networking
Talks:
Andrew Wang: Introduction to HDFS Erasure Coding
HDFS erasure coding is a new feature in Hadoop 3 that reduces the storage cost of HDFS by up to 2x. Learn how erasure coding works under the hood, as well as how best to apply it to your applications.
Bio:
Andrew Wang is a software engineer at Cloudera, an Apache Hadoop committer and PMC member, an Apache member, and the release manager of Apache Hadoop 3.0. Before Cloudera, he was a PhD student at the University of California, Berkeley, where he worked on distributed storage systems.
-------------------------------------------------------------------------------------------
Adam Szita, Peter Vary: When it hurts to get rid of data
Short Apache Hive history. Where we are coming from, what Hive is used for, and where we are heading to. Currently customers are struggling with big number of partitions/tables, and previously adequate methods handling metadata getting more and more inconvenient. We will show some ongoing work which helps to alleviate the problem in short term.
Bio:
Adam Szita is an Apache Hive and Pig committer. He joined Cloudera 2 years ago as one of the first few members of the Engineering team, and currently working in the Budapest Hive team.
Peter Vary is a committer of Apache Hive. He joined Cloudera 2 years ago as one of the first few members of the Engineering team, and currently working in the Budapest Hive team.
Please note that this is an English speaking event.

Intro to Apache Hive and HDFS Erasure Coding