• Apache Flink @ Teralytics

    Teralytics AG

    In this meetup we will look into Flink use-cases in academia and industry. We will have 3 talks: - an intro and community update to learn what's new in Flink since we last met - a Flink use-case from our hosts, Teralytics - a talk on how researchers at the Information Security Group of ETH Zürich use Flink for online monitoring Schedule ---------------- 6:00pm - 6:15pm: Arrival 6:15pm - 6:45pm: Talk #1 6:45pm - 7:15pm: Talk #2 7:15pm - 7:30pm: Break 7:30pm - 8:00pm: Talk #3 8:00pm - drinks and discussion ================================================================ Talk #1 ------------ Intro & community update Vasia Kalavri, ETH Zürich Abstract: This talk will provide a short overview of Flink for our new members and then focus on what is new in the community and the Flink ecosystem. We will review major recent features and improvements, such as the new deployment and process model, broadcast state, task-local state recovery, and SQL enhancements. ================================================================ Talk #2 ----------- Data Processing Use Cases at Teralytics using Flink Bertrand Bossy, Teralytics Abstract: In this talk we will show why we have chosen Flink for some of our use cases. We will share some lessons learned from developing applications using Flink and running them in production on Mesos with monitoring using InfluxDB and Grafana. Bio: Bertrand Bossy is a Tech Lead and Staff Software Engineer in Teralytics’ Platform team. Before joining Teralytics more than five years ago, Bertrand studied Computer Science at ETH Zurich. At Teralytics he has contributed to several products, as well as the platform supporting and enabling 30+ Engineers and Data Scientists. Bertrand does most of his coding in Scala. In terms of data processing, he has experience working with Hadoop, Storm, Spark, Kafka, Cloud Products and recently Flink. ================================================================ Talk #3 ----------- Integrating Stateful External Processes into Flink Joshua Schneider, ETH Zürich Abstract: The programming model of Apache Flink allows the specification of arbitrary transformations on data streams. However, it is not always desirable to re-implement an existing algorithm using Flink's API. We have added a custom operator to Flink that integrates a stateful external process into the dataflow. It provides fault tolerance for processes that have an interface for state snapshots. If the process does not have side-effects, the operator guarantees exactly-once semantics for the output stream. This goes beyond the capabilities of the built-in async I/O operator. This extension was motivated by our work on online monitors, which detect complex temporal patterns in event streams. The new operator allowed us to parallelise an existing single-threaded monitoring tool by embedding it into a Flink application. Bio: Joshua Schneider is a doctoral student in the Information Security Group at ETH Zürich under supervision of Prof. David Basin. He currently works on new monitoring algorithms that scale better to large data streams, and their theoretical foundations. His general interests are formal methods with practical applications.

    2
  • Data stream processing at PSI

    CAB H52, ETH Zurich

    For our second meetup, we're very excited to have a streaming use-case from the Paul Scherrer Institut and host Kostas Kloudas, Apache Flink committer and software engineer at data Artisans. During the first part of the meetup, we will look into the use-case requirements and current setup, and solution architecture. During the second part of the meetup, Kostas will give as a talk on "Stateful Stream Processing with Apache Flink": Abstract As Apache Flink continues to push the boundaries of stateful stream processing, an increasing number of users are starting to realize the potential of stateful stream processing as a promising paradigm for robust and reactive data analytics as well as event-driven applications. This talk aims at covering the general idea and motivations of stateful stream processing, and how Flink enables it with its powerful set of state management features and programming APIs. After the talk, Kostas will provide his view on how the use-case can be implemented and deployed with Apache Flink. Finally, we will have an open discussion on the advantages and disadvantages of both solutions. Bio Kostas is a Flink Committer, currently working with data Artisans to make Apache Flink® the best open-source stream processing engine and your data's best friend. Before joining data Artisans, Kostas was a postdoctoral researcher at IST in Lisbon and even before that he obtained a PhD in Computer Science from INRIA (France). His main research focus was in cloud storage and distributed processing.

    2
  • Apache Flink Zurich Kickoff

    CAB H52, ETH Zurich

    Welcome to the 1st Apache Flink meetup in Zurich! We are very lucky to have Timo Walther visiting us all the way from Berlin for this occasion. We plan to have a short introduction talk followed by Timo's talk which should take around 45'. After the talk, we encourage you to stay around for some food, drinks, and discussion. Schedule ================ 7:00 pm - 7:15 pm: Welcome 7:15 pm - 7:30 pm: Apache Flink intro 7:30 pm - 8:15 pm: Table & SQL API – unified APIs for batch and stream processing 8:15 pm - ... : open discussion & beer ====================== Title: Table & SQL API – unified APIs for batch and stream processing Abstract: SQL is undoubtedly the most widely used language for data analytics. It is declarative and can be optimized and efficiently executed by most query processors. Therefore the community has made effort to add relational APIs to Apache Flink, a standard SQL API and a language-integrated Table API. Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite. Since Flink supports both stream and batch processing and many use cases require both kinds of processing, we aim for a unified relational layer. In this talk we will look at the current API capabilities, find out what’s under the hood of Flink’s relational APIs, and give an outlook for future features such as dynamic tables, Flink’s way how streams are converted into tables and vice versa leveraging the stream-table duality. Bio: Timo Walther is a PMC member of Apache Flink® and works as a software engineer at data Artisans. He studies Computer Science at TU Berlin, worked at IBM Germany, and participated in the Database Systems and Information Management Group of TU Berlin.

    5