Data updates in HDFS-like stores using one of Apache Hive ™, Hudi or Iceberg

Name: Data updates in HDFS-like stores using one of Apache Hive ™, Hudi or Iceberg
Start: 2019-02-26T18:00:00-08:00
End: 2019-02-26T20:30:00-08:00
Location: Adobe World Headquarters

Hosted by Jaemi B. and Hitesh S.

Adobe Experience Platform Meetups

Details

Managing updates to data in HDFS-like append-only systems has always been an interesting challenge to address. Different projects have tried to solve this in a variety of ways.

Join us for a meetup hosted by Adobe Experience Platform. We are excited to have Ryan Blue, Vinoth Chandar and Eugene Koifman, committers on the Apache Iceberg (incubating), Apache Hudi (incubating) and Apache Hive™ projects on how they went about addressing this problem.

Agenda:

6:00 - 6:20 PM -  Mingling & Refreshments
6:20 – 6:30 PM - Introductions by Adobe
6:30 - 7:00 PM -  Hive ACID by Eugene Koifman, Cloudera, Apache Hive PMC
7:00 - 7:30 PM -  Hudi  by Vinoth Chandar, Uber, Apache Hudi PPMC
7:30 - 8:00 PM -  Iceberg by Ryan Blue, Netflix, Apache Iceberg PPMC
8:00 - 8:30 PM -  General Q&A

“Hive ACID”, Eugene Koifman, Cloudera, Hive PMC

Apache Hive has matured over time to support more features found in traditional databases.  The talk will cover support for ACID transactions in Apache Hive 3.0 and data modification operations.  The talk will describe the intended use cases and architecture of the implementation.

Eugene Koifman is an Apache Hive committer and a member of the PMC.  For the last 6 years, he has been on the Hive team at Hortonworks (now Cloudera) concentrating on adding support for transactional tables and operations such as Update and Merge.  Prior Cloudera, he worked on a federated SQL engine at Composite Software and held various engineering roles at BEA, abOracle, and others.

“ Hudi”, Vinoth Chandar, Uber, Apache Hudi PPMC

Hudi storage system was created at Uber. This talk will cover how the Hudi storage system provides atomic upserts and incremental change streams, right on top of typical Hadoop compatible big data storage (HDFS/S3/GCS). The talk will cover motivating use-cases and briefly also touch upon the conceptual underpinnings, tradeoffs, and key design choices. The speakers will share how it's leveraged at Uber as well as provide code recipes using the Hudi toolset to accomplish common tasks like database ingestion and log de-duplication.

Vinoth Chandar is a Staff Software Engineer at Uber, who architected much of its foundational big data infrastructure. Vinoth has a keen interest in data systems and years of experience working on diverse distributed systems like key-value stores, database replication, storage engines, cluster management.

“ Iceberg” by Ryan Blue, Netflix, Apache Iceberg PPMC

This talk will focus on how Apache Iceberg commits changes to a table and handles conflicts between concurrent writers. It will also cover some background on Iceberg, including design choices and how those choices affect concurrency. Last, this talk will include a project status update and some recent lessons learned.

Ryan Blue works on open source components of Netflix's data platform, including Spark and Iceberg. He created Apache Iceberg to solve many of the persistent scale and usability problems that he hit while working on Hadoop ecosystem projects for the past 8 years. He is an Apache member and a PMC member of Apache Parquet and Avro.

Adobe Experience Platform Meetups

Data updates in HDFS-like stores using one of Apache Hive ™, Hudi or Iceberg

Adobe Experience Platform Meetups

Details

“Hive ACID”, Eugene Koifman, Cloudera, Hive PMC

“ Hudi”, Vinoth Chandar, Uber, Apache Hudi PPMC

“ Iceberg” by Ryan Blue, Netflix, Apache Iceberg PPMC

Related topics

You may also like

Data updates in HDFS-like stores using one of Apache Hive ™, Hudi or Iceberg

Adobe Experience Platform Meetups

Details

“Hive ACID”, Eugene Koifman, Cloudera, Hive PMC

“ Hudi”, Vinoth Chandar, Uber, Apache Hudi PPMC

“ Iceberg” by Ryan Blue, Netflix, Apache Iceberg PPMC

Related topics

You may also like

“ Hudi”, Vinoth Chandar, Uber, Apache Hudi PPMC

“ Iceberg” by Ryan Blue, Netflix, Apache Iceberg PPMC