Apache Flink @ Teralytics


Details
In this meetup we will look into Flink use-cases in academia and industry.
We will have 3 talks:
- an intro and community update to learn what's new in Flink since we last met
- a Flink use-case from our hosts, Teralytics
- a talk on how researchers at the Information Security Group of ETH Zürich use Flink for online monitoring
Schedule
6:00pm - 6:15pm: Arrival
6:15pm - 6:45pm: Talk #1
6:45pm - 7:15pm: Talk #2
7:15pm - 7:30pm: Break
7:30pm - 8:00pm: Talk #3
8:00pm - drinks and discussion
================================================================
Talk #1
Intro & community update
Vasia Kalavri, ETH Zürich
Abstract:
This talk will provide a short overview of Flink for our new members and then focus on what is new in the community and the Flink ecosystem. We will review major recent features and improvements, such as the new deployment and process model, broadcast state, task-local state recovery, and SQL enhancements.
================================================================
Talk #2
Data Processing Use Cases at Teralytics using Flink
Bertrand Bossy, Teralytics
Abstract:
In this talk we will show why we have chosen Flink for some of our use cases. We will share some lessons learned from developing applications using Flink and running them in production on Mesos with monitoring using InfluxDB and Grafana.
Bio:
Bertrand Bossy is a Tech Lead and Staff Software Engineer in Teralytics’ Platform team. Before joining Teralytics more than five years ago, Bertrand studied Computer Science at ETH Zurich.
At Teralytics he has contributed to several products, as well as the platform supporting and enabling 30+ Engineers and Data Scientists.
Bertrand does most of his coding in Scala. In terms of data processing, he has experience working with Hadoop, Storm, Spark, Kafka, Cloud Products and recently Flink.
================================================================
Talk #3
Integrating Stateful External Processes into Flink
Joshua Schneider, ETH Zürich
Abstract:
The programming model of Apache Flink allows the specification of
arbitrary transformations on data streams. However, it is not
always desirable to re-implement an existing algorithm using Flink's API.
We have added a custom operator to Flink that integrates a stateful
external process into the dataflow. It provides fault tolerance for
processes that have an interface for state snapshots. If the
process does not have side-effects, the operator guarantees exactly-once
semantics for the output stream. This goes beyond the capabilities
of the built-in async I/O operator.
This extension was motivated by our work on online monitors, which
detect complex temporal patterns in event streams. The new operator
allowed us to parallelise an existing single-threaded monitoring
tool by embedding it into a Flink application.
Bio:
Joshua Schneider is a doctoral student in the Information Security
Group at ETH Zürich under supervision of Prof. David Basin. He
currently works on new monitoring algorithms that scale better to
large data streams, and their theoretical foundations. His general
interests are formal methods with practical applications.

Apache Flink @ Teralytics