November Meetup in Karlsruhe


Details
Next Hadoop / Big data meetup November 14th in Karlsruhe, at
Höpfner Burgstüble (http://www.burgstueble-schalander.de/)
Meet you there!
Schedule: November 14, 2013
18:00 Socializing & beer
18:30 Bug bites Elephant: Test-driven Quality Assurance in Big Data Application Development, Dominik Benz, Inovex
19:00 Big Data Discovery und Analytics auf Hadoop -- so einfach wie Excel, Harald Müller & Peter Jeitschko, Datameer
Talk 1: Bug bites Elephant: Test-driven Quality Assurance in Big Data Application Development
Around the currently available large piles of Big Data, there's happening quite a mixed gathering: Business Engineers define which insightswould be precious, Analysts build models, Hadoop programmers tame the flood of data, and Operations people setup machines and networks. It's exactly the interplay of all participants which is central to project success. This setup together with the distributed nature of processing poses new challenges to well-established models of assuring software artifact quality: How can non-programmers define acceptance criteria? How can functionalities be tested which depend on cluster execution, orchestration of, e.g., different hadoop jobs without delaying the development process? Which data selection is suited best for simulating the live environment? How can intermediate results in arbitrary serialization formats be inspected?
In this talk, experiences and best practices from approaching these problems in a large-scale log data analysis project will be presented.
At 1&1, our team develops hadoop applications which process roughly 1 billion log events (~1 TB) per day. We will give an overview of the hard- and software setup of our quality assurance environment, which includes FitNesse as a wiki-style acceptance testing framework.Starting from a comparison with existing test frameworks like MRUnit, we will explain how we automate the parameterized deployment of our applications, choose test data sampling strategies, perform workflow management and orchestration of jobs / applications, and use Pig for inspection of intermediate results and definition of final acceptance criteria. Our conclusion is that test-driven development in the field of Big Data requires adaption of existing paradigms, but is crucial for maintaining high quality standards for the resulting applications.
Dr. Dominik Benz studied Computer Science with minor Psychology at the University of Freiburg, Germany. In his PhD at the Knowledge and Data Engineering Group (University of Kassel) he applied Data Mining and Knowledge Discovery methods to large datasets of Social Web Systems in order to discover emergent semantic structures. Since November 2012 he is working as a Big Data Engineer at Inovex GmbH, focussing on quality-driven development of Hadoop Applications in Business Intelligence contexts.
Talk 2: Big Data Discovery und Analytics auf Hadoop -- so einfach wie Excel
Datameer’s end-to-end Big Data Discovery and Hadoop analytics solution ensures the fastest time to discovering insights in any data. Anyone can use Datameer’s wizard-based data integration, iterative point-and click analytics, and drag-and-drop visualizations to get the broadest possible view of their organization. Where business intelligence provide answers to known questions, big data discovery reveals unknown patterns, relationships and insights. Leveraging the power of Hadoop, Datameer makes it easy for anyone to quickly discover insights in any data, right away. Big data analytics use cases span every vertical industry in three primary areas: Marketing and CRM, fraud and risk management, and operational intelligence. Datameer is as simple as a spreadsheet, with 240+ pre-built, point-and-click functions, from simple joins and filters to advanced sentiment analysis and predictive analytics on structured and unstructured data. You can also write your own functions in Java, R, Python and more.
Founded by Hadoop veterans in 2009, Datameer scales from a laptop to thousands of nodes and is available for all major Hadoop distributions including Apache, Cloudera, EMC, Hortonworks, IBM, MapR, Yahoo!, Amazon and Microsoft Azure.
Harald Müller is Senior Director Sales EMEA at Datameer. Peter Jeitschko is Solution Engineer EMEA at Datameer.

November Meetup in Karlsruhe