Topic: Big Data - How Gilt Manages Real-time Data Capturing with Kafka, Avro and Hadoop/Hive
Michael Hansen - Principal Data Engineer Gilt Groupe
7:00 - Arrival - Snacks, Pizza and Networking.
7:30 - Introduction / Announcements by the organizers.
7:35 - 7:50 - Demos / Pitches
Demo 1 - SuperDealyo - Sach Kangovi
Demo 2 - Gruberie - Sven Hermann
Demo 3 - Outdoor Exchange (OX) - Dariusz Jamiolkowski
7:50 - How Gilt Manages Real-time Data Capturing with Kafka, Avro and Hadoop/Hive - Michael Hansen - Principal Data Engineer - Gilt Groupe
8:45 - Open-mic to quickly promote your business or broadcast a need that someone in the group might be able to ﬁll.
8:55 - Wrap-Up, discussion of Meetup, feedback and opportunities for improvement or future topics.
8:59 - End of formal part of meeting.
9:00 - Exit Venue and head to After Hours Party - Location: TBA
More about this Event:
SuperDealyo: Presentation and Demo by SuperDealyo team of a unique location based, shopping list driven platform bringing lowest prices to YOUR fingertips.
Gruberie: Your one stop gateway to great food deals.
Outdoor Exchange (OX): A trusted community based platform where the supply and demand for rental of outdoor gear is met.
Michael Hansen - Principal Data Engineer - Gilt Groupe
Large-scale, real-time (or near real-time) data capture of various clickstream and messaging events has become much more practical with the combination of Kafka and Hadoop. However, without some sort of backward compatible data structure for these data events, a lot of unnecessary transformation and formatting work is left to the data consumers. This is where protocol buffers, a data serialization system like Apache Avro, or frameworks like Apache Thrift can come to the rescue. This talk will focus on how Gilt uses the trio of Kafka, Avro, and Hadoop/Hive to manage and control data structure for real-time events passed into HDFS/Hive and/or consumed by other web services.