19th Swiss Big Data User Group Meeting

This is a past event

137 people went

ETH Zurich, Room HG D 3.2

Rämistrasse 101 · Zürich

How to find us

ETH Zurich, Main Building (HG), Room D 3.2, Rämistrasse 101, 8092 Zürich

Location image of event venue



18:00 Welcome & Intro

Title: Data Science and Data Products at Neue Zürcher Zeitung

With 236 years of age, Neue Zürcher Zeitung (NZZ) is one of the oldest still published newspapers in the world. However, despite its age, NZZ is far from being old-fashioned. Especially when it comes to data-driven decision making and data-driven innovation, NZZ has been investing a lot within the last three years. For wrangling large amounts of data we have been using Apache Spark for almost a year now – and do not regret this choice. It had not only given us flexibility with ad-hoc analytics, but also drives our data-products (in production). In this talk I will share some of our use cases as well as insights we gained over the last year with Apache Spark. I will especially talk about how we calculate article recommendations and showcase some new exciting data products which are currently in active development.

René Pfitzner is the Lead Data Scientist at NZZ, Switzerland’s newspaper of records. He is interested in media innovation, especially algorithmic approaches for news media. Before joining NZZ he has been a research assistant at ETH Zurich, where his research broadly evolved around the topics “network theory” and “information networks”.

Title: Real-Time Alarm Verification with Spark Streaming and Machine Learning

False alarms are not only a nuisance but also costly. According to various security services, 90% of reported incidents are false alarms and often unnecessarily trigger emergency services such as fire fighters or police.

In this talk we share our experiences in building a real-time based alarm verification service with the industry leader in secure alarm transmission. The goal of our system is to help human responders in their decision about whether or not to trigger costly intervention forces. In particular, we will give insights into applying various machine learning algorithms to identify false alarms and to discover alarm patterns in real-time. We also present a performance study on using Spark for stream and historic data analytics as well as online machine learning to provide more evidence about potential false alarms.

Ana Sima, Jan Stampfli, Kurt Stockinger from ZHAW Zürich University of Applied Sciences

Title: Introduction to Apache Flink

Apache Flink is a modern data streaming framework with a unique combination of features such as high throughput, millisecond latency, powerful windowing abstractions, and support for event time and out-of-order streams. Flink enables developers to implement advanced streaming applications and execute them at scale in a fault-tolerant manner. Over the past couple of years, Flink has gained popularity and has built a vibrant community of users and contributors around it. Today, the project counts more than 250 contributors and powers several business-critical applications in several companies, including Alibaba, Bouygues Telecom, King.com, and Zalando. In this talk, I will provide an overview of Apache Flink and I will show how it enables new streaming applications and architectures. I will also discuss exciting new features of the upcoming 1.2 release, such as dynamic scaling and queryable state.

Vasia (Vasiliki) Kalavri is a PMC member of Apache Flink and a core developer of its graph processing API, Gelly. She has recently moved to Zurich and is joining the ETH Systems group as a postdoctoral researcher in January. Vasia has a PhD in Distributed Computing from KTH, Stockholm and UCLouvain, Belgium, and has previously interned at Telefonica Research and data Artisans.