Data updates in HDFS-like stores using one of Apache Hive ™, Hudi or Iceberg


Details
Managing updates to data in HDFS-like append-only systems has always been an interesting challenge to address. Different projects have tried to solve this in a variety of ways.
Join us for a meetup hosted by Adobe Experience Platform. We are excited to have Ryan Blue, Vinoth Chandar and Eugene Koifman, committers on the Apache Iceberg (incubating), Apache Hudi (incubating) and Apache Hive™ projects on how they went about addressing this problem.
Agenda:
- 6:00 - 6:20 PM - Mingling & Refreshments
- 6:20 – 6:30 PM - Introductions by Adobe
- 6:30 - 7:00 PM - Hive ACID by Eugene Koifman, Cloudera, Apache Hive PMC
- 7:00 - 7:30 PM - Hudi by Vinoth Chandar, Uber, Apache Hudi PPMC
- 7:30 - 8:00 PM - Iceberg by Ryan Blue, Netflix, Apache Iceberg PPMC
- 8:00 - 8:30 PM - General Q&A
“Hive ACID”, Eugene Koifman, Cloudera, Hive PMC
Apache Hive has matured over time to support more features found in traditional databases. The talk will cover support for ACID transactions in Apache Hive 3.0 and data modification operations. The talk will describe the intended use cases and architecture of the implementation.
Eugene Koifman is an Apache Hive committer and a member of the PMC. For the last 6 years, he has been on the Hive team at Hortonworks (now Cloudera) concentrating on adding support for transactional tables and operations such as Update and Merge. Prior Cloudera, he worked on a federated SQL engine at Composite Software and held various engineering roles at BEA, abOracle, and others.
“ Hudi”, Vinoth Chandar, Uber, Apache Hudi PPMC
Hudi storage system was created at Uber. This talk will cover how the Hudi storage system provides atomic upserts and incremental change streams, right on top of typical Hadoop compatible big data storage (HDFS/S3/GCS). The talk will cover motivating use-cases and briefly also touch upon the conceptual underpinnings, tradeoffs, and key design choices. The speakers will share how it's leveraged at Uber as well as provide code recipes using the Hudi toolset to accomplish common tasks like database ingestion and log de-duplication.
Vinoth Chandar is a Staff Software Engineer at Uber, who architected much of its foundational big data infrastructure. Vinoth has a keen interest in data systems and years of experience working on diverse distributed systems like key-value stores, database replication, storage engines, cluster management.
“ Iceberg” by Ryan Blue, Netflix, Apache Iceberg PPMC
This talk will focus on how Apache Iceberg commits changes to a table and handles conflicts between concurrent writers. It will also cover some background on Iceberg, including design choices and how those choices affect concurrency. Last, this talk will include a project status update and some recent lessons learned.
Ryan Blue works on open source components of Netflix's data platform, including Spark and Iceberg. He created Apache Iceberg to solve many of the persistent scale and usability problems that he hit while working on Hadoop ecosystem projects for the past 8 years. He is an Apache member and a PMC member of Apache Parquet and Avro.

Data updates in HDFS-like stores using one of Apache Hive ™, Hudi or Iceberg