It can be challenging when dealing with data from multiple data sources, each having its own format type. MapReduce can be limiting when you want to process non-text data streams. In this presentation we'll talk about some of the challenges the analytics team at Knewton has faced ingesting data from multiple sources and introduce ways of writing your own Input and output formats as well as record readers and writers. We'll present real applications, introduce the KassandraMRHelper (https://github.com/Knewton/KassandraMRHelper) and talk about the advantages of creating custom input and output formats.
Giannis Neokleous (www.giann.is) is a software engineer at Knewton in the Analytics Team with experience in dealing with large distributed systems handling TBs of data. Giannis holds a masters degree from Stanford University.
6:30 to 7:00 PM - Networking
7:00 to 8:00 PM - Knewton Presentation
8:00 to 8:30 PM - Q&A/Networking
If your company is interested in sponsoring this event, please reach to me.