Hi All, I'm happy to announce another Flink meetup. This time we'll have two presentations. Olga Reznik and Jens Kat from ING will present Exploiting Apache Flink's Stateful Operators to unify Historical and Real-Time Processing, and Timo Walther a PMC from Data Artisans ( the company behind Flink ) will present Table & SQL API - unified APIs for batch and stream processing.
• 18:00 Arrive, mingle, pizza, drinks etc.
• 18:45 Exploiting Apache Flink's Stateful Operators to unify Historical and Real-Time Processing by Olga Reznik and Jens Kat
Apache Flink is a framework that allows you to do streaming out of the box, especially for stateless event processing. However, an event by itself typically does not contain sufficient information for its processing and this is where Flink’s stateful processing comes to the rescue. It allows enriching the data of the event with historical data, i.e. data acquired from previous events. In order to support this, Flink provides a way to distribute the state and the processing by key. This enables us to e.g. execute statistical functions.
Sometimes, though, an event contains not a single key but multiple keys, for instance an order can contain a customerId, supplierId and productId’s. This means that the state needs to be distributed and aggregated in multiple ways. We will present a solution where an event is enriched with multiple aggregates without shuffling all the data across the network multiple times and hence keeping latency at a sub second level. The resulting enriched event is used as input to score machine learning models in a streaming fashion.
By using control streams we can dynamically update the definitions of our machine learning models providing us high flexibility without bringing the Flink job down.
• 19:45 Table & SQL API - unified APIs for batch and stream processing by Timo Walther.
SQL is undoubtedly the most widely used language for data analytics. It is declarative and can be optimized and efficiently executed by most query processors. Therefore the community has made effort to add relational APIs to Apache Flink, a standard SQL API and a language-integrated Table API.
Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite. Since Flink supports both stream and batch processing and many use cases require both kinds of processing, we aim for a unified relational layer.
In this talk we will look at the current API capabilities, find out what's under the hood of Flink’s relational APIs, and give an outlook for future features such as dynamic tables, Flink's way how streams are converted into tables and vice versa leveraging the stream-table duality.
• 21:30: Everybody out
Olga Reznik has been a software engineer for 7 years. She has enjoyed working with a range of programming languages, and is now having fun developing in Scala. She joined ING Bank Netherlands in 2014 after finishing a data reverse engineering thesis project for ING. Currently, Olga is a part of the squad that develops a streaming platform for fraud detection using Apache Flink.
Jens Kat is a software engineer at ING Bank Netherlands.
He loves to play around with functional programming, and streaming analytics is a perfect use case to do so. In his working life he currently builds an analytics platform using Apache Flink and Scala. Besides programming, he shares knowledge by teaching training sessions on the Scala programming language.
Timo Walther is a PMC member of Apache Flink® and works as a software engineer at data Artisans. He studies Computer Science at TU Berlin, worked at IBM Germany, and participated in the Database Systems and Information Management Group of TU Berlin.