BDNSHH November


Details
Sessions
- Schedoscope – Painfree Agile Hadoop Datahub Development and Operation
Speaker: Dr. Gerd Utz Westermann, Otto Group
In an agile environment, standard schedulers such as Oozie make developing, testing, and scheduling of jobs for a Hadoop datahub a painful task full of dull overhead. Painpoints of particular note are the development of new aggregates or changing the logic of existing aggregates: identification, preparation, and execution of migrations of both data and schema constitute tedious manual and error-prone tasks.
Schedoscope is a Scala-based scheduling framework for integrated specification of data structure, data dependencies, and computation logic via a concise DSL.
Schedoscope not only supports demand-driven materialization of both table schemas and table data but also automatically recognizes and deals with changes to data structure and logic. In addition, it features a productive test framework allowing for test-driven development of aggregates.
Thus, Schedoscope faciliates an agile development process and reduces significant operations overhead. As a result, Schedoscope empowers a small team at the Business Intelligence of the Otto Group to operate and grow a Hadoop datahub with web analytics data for more than 60 of the Group’s shops in an agile fashion.
Schedoscope is open-source software available at http://schedoscope.org
- Teaming up heterogeneous sports data - Building a 360º view of the FIFA World Cup
Speaker: Jochen Jörg, MarkLogic GmbH
Starting with heterogeneous data sets like JSON, XML, HTML and binary data, we will illustrate the journey of building a web application that unveils interesting new insights about the world’s biggest single sporting event: the FIFA World Cup 2014.We will focus on the importance of agile data processing and how to interrogate the data to find answers about the tournament and players etc. that previously were not available using other technologies. The demo will show the flexibility in the data layer combined with extensive APIs which are key features for building Big Data solutions.
Jochen works as Principal Technologist at MarkLogic GmbH. His main areas of interest are Software Architecture, Software Development as well as Data Modeling and Data Processing with NoSQL technologies. Jochen helps organizations to create ideas and concepts to implement solutions and applications based on heterogeneous and disparate data.

BDNSHH November