Advanced Data Storage Technologies - AVRO and Parquet


Details
We are thrilled to have Dave Smelker from ADURANT Technologies present on Advanced Data Storage Technologies providing a technical overview of Avro and Parquet. This discussion will also include practical examples of deploying both Avro and Parquet while leveraging Apache Spark. Avro is an Apache Project that is getting significant momentum in handling and storing large volumes of serialized data, and Parquet is a maturing Apache Project providing columnar storage for any project in the Hadoop ecosystem. Come out and learn how to apply these technologies for your organization!
Agenda
• 6:00 – 6:30 - Socialize over food and drink
• 6:30 – 6:45 - Announcements, Upcoming Events
• 6:45 – 8:00 - Technical Overview of Avro and Parquet - Dave Smelter
• 8:00 – 8:30 - Deployment of Avro and Parquet with Apache Spark - Dave Smelker
• 8:30 – ??? - Continued socializing
About the presentation
This presentation will include a technical deep dive into Avro to include how it works, when to use it, the ease of adding data elements, and benefits of compression. The deep dive into Parquet will include how it works, when to use it, scaling data using columnar compression, technical limitations and disadvantages, and native support in Hive. The session will wrap with practical deployment examples to include how to deploy Avro with Amazon SNS and leveraging Spark to load data into a dynamic Avro schema, as well as leveraging Parquet and Spark performance synergies to quickly store large, semi-structured data sets.

Advanced Data Storage Technologies - AVRO and Parquet