Skip to content

Advanced Data Storage Technologies - AVRO and Parquet

Photo of Brett Weninger
Hosted By
Brett W.
Advanced Data Storage Technologies - AVRO and Parquet

Details

We are thrilled to have Dave Smelker from ADURANT Technologies present on Advanced Data Storage Technologies providing a technical overview of Avro and Parquet. This discussion will also include practical examples of deploying both Avro and Parquet while leveraging Apache Spark. Avro is an Apache Project that is getting significant momentum in handling and storing large volumes of serialized data, and Parquet is a maturing Apache Project providing columnar storage for any project in the Hadoop ecosystem. Come out and learn how to apply these technologies for your organization!

Agenda

• 6:00 – 6:30 - Socialize over food and drink

• 6:30 – 6:45 - Announcements, Upcoming Events

• 6:45 – 8:00 - Technical Overview of Avro and Parquet - Dave Smelter

• 8:00 – 8:30 - Deployment of Avro and Parquet with Apache Spark - Dave Smelker

• 8:30 – ??? - Continued socializing

About the presentation

This presentation will include a technical deep dive into Avro to include how it works, when to use it, the ease of adding data elements, and benefits of compression. The deep dive into Parquet will include how it works, when to use it, scaling data using columnar compression, technical limitations and disadvantages, and native support in Hive. The session will wrap with practical deployment examples to include how to deploy Avro with Amazon SNS and leveraging Spark to load data into a dynamic Avro schema, as well as leveraging Parquet and Spark performance synergies to quickly store large, semi-structured data sets.

Photo of Boulder/Denver BigData Meetup group
Boulder/Denver BigData Meetup
See more events
Level 3 Events Center - Entrance on the North Side
1025 Eldorado Blvd · Broomfield, CO