Apache Arrow: The In-Memory Layer Your Iceberg, Spark, and Parquet

Name: Apache Arrow: The In-Memory Layer Your Iceberg, Spark, and Parquet
Start: 2026-02-26T20:30:00+05:30
End: 2026-02-26T21:30:00+05:30

Hosted by Sandeep D.

Super Organizer

Bangalore Apache Iceberg™ Meetups

Details

10 years of Arrow. 30 minutes to understand why it's everywhere.

If you work with modern data infrastructure, Arrow is almost certainly running somewhere in your stack. Most engineers never notice it.
Arrow solved a real problem: moving data between systems required serializing and deserializing at every boundary. CPU cycles, memory copies, latency. At scale, that cost compounds fast. Arrow's solution was a language-agnostic columnar memory format any system could share without copying. What started as a memory layout spec became the execution substrate of the modern data stack.

In this 30-minute session, Badal Singh, who has contributed to Apache Iceberg Go and built OLake's Arrow-based ingestion writer at 550,000+ rows/second, will cover:

From niche interoperability project to de-facto standard: Apache Arrow's 10-year journey
What Arrow actually is beyond "columnar in-memory format" and why that definition undersells it
How zero-copy data sharing eliminates serialization overhead and what that means for pipeline performance
Where Arrow runs today: Spark, Pandas, ClickHouse, Polars, and inside open table formats like Apache Iceberg Go
What's next: Arrow Flight, ADBC, nanoarrow, and the ecosystem reshaping how data systems talk to each other

Bangalore Apache Iceberg™ Meetups

Apache Arrow: The In-Memory Layer Your Iceberg, Spark, and Parquet

Bangalore Apache Iceberg™ Meetups

Details

Related topics

You may also like