Apache Flink: Unifying batch and streaming modern data analysis

This is a past event

85 people went

Location image of event venue

Details

Apache Flink (flink.apache.org (http://flink.apache.org/)) is an open-source framework for data analysis, consisting of a streaming dataflow engine as well as built-in APIs and libraries.

Apache Flink joined the Apache Incubator in 2014, and graduated as a top level project in December 2014. Since entering the Apache family, Flink has grown a lot both in terms of code and community.

Starting with a dataflow engine, the Flink community has added fluent programming APIs for batch, iterative, and stream processing, as well as libraries that use these APIs such as Table (SQL-like queries), FlinkML (a Machine Learning library), and Gelly (an API and library for graph analysis). Flink has more than 100 contributors, and is one of the most active big data projects in Apache.

As data streaming is becoming very popular, there is a lot of interest in stream processing frameworks like Flink that provide a combination of low latency, mutable state, and high-level programming APIs. What is unique in Flink is that the underlying engine has built-in support for diverse workloads without compromising on performance or usability. For example, the system executes stream processing natively, and models batch programs as streaming programs on finite data streams. Another area that Flink pioneered is memory management inside the JVM.

In the first meetup, we are lucky enough to have three of the initiators of the Apache Flink project, as well as current PMC members: Robert Metzger, Stephan Ewen, Kostas Tzoumas, and also Henry Saputra, current PMC member and one of the mentors of Flink during incubation, will come by and talk about the past, present, and future of Flink.

Final schedule and talks of the first meetup still finalized. Please join us for the exciting evening of the first Apache Flink Bay Area meetup.

Speaker Bios:

Robert Metzger is a committer at Apache Flink and co-founder and software engineer at data Artisans. Robert studied Computer Science at TU Berlin and worked at IBM Germany and at the IBM Almaden Research Center in San Jose.

Stephan Ewen is committer and Vice President of Apache Flink and co-founder and CTO of data Artisans. Before founding data Artisans, Stephan was leading the development of Flink since the early days of the project (then called Stratosphere). Stephan has a PhD in Computer Science from TU Berlin.

Kostas Tzoumas is a committer at Apache Flink and co-founder and CEO of data Artisans. Before founding data Artisans, Kostas was a postdoctoral researcher at TU Berlin and received a PhD in Computer Science from Aalborg University.

Tentative agenda for the evening:

6:30 - 7:00 :: Door open and socializing

7:00 - 7:05 :: Introduction

7:05 - 7:15 :: Community Updates

7:15 - 8:30 :: Deep dive and live demo

8:30 - 9:00 :: Q&A and closing

Abstract of the Deep-Dive talk

At the heart of Apache Flink is a flexible dataflow engine that supports diverse features and workloads without compromising on performance or usability: The engine executes data streaming programs directly as streams (with low latency and flexible user-defined state), and models batch programs as streaming programs on finite data streams. Iterative programs are supported though feedback in the dataflow, graph analysis via "delta-iterations". Through elaborate memory management inside the JVM, Flink scales beyond main memory resident data sets.

On top of the dataflow engine, the Flink community has added fluent programming APIs for batch-, and stream processing, as well as a set of libraries, such as the Table API (relational queries), FlinkML (Machine Learning library), and Gelly (API and library for graph analysis).

This talk will present the architecture of Flink and discusses the design choices and tradeoffs that come with building a versatile analysis engine on top of a data streaming abstraction. We show examples and use cases, and give an outlook of the current developments in the Flink project.