Skip to content

Introducing Apache Flink - A new approach to distributed data processing

Introducing Apache Flink - A new approach to distributed data processing

Details

The talk introduces the Apache Flink (incubating) project (http://flink.incubator.apache.org (http://flink.incubator.apache.org/)), a new project at the Apache Software Foundation. Flink’s goal is to make large-scale distributed computing as simple as possible.

Door will open at 6pm and close at 7pm. Please bring ID to the meetup event.

Flink seamlessly integrates with the Hadoop ecosystem and runs on top of HDFS and YARN. The Java and Scala APIs provide users with intuitive abstractions to express data-intensive computations. High-level operators such as map, reduce, or join are translated to a data-flow DAG. This translation step involves a cost-based optimizer that enables Flink applications to run with little (re-)configuration and little maintenance when the cluster characteristics change and the data evolves over time.

Further down in Flink’s layered architecture, a highly optimized execution engine aggressively uses in-memory execution. If memory runs out, the internal operators like sorting and hashing gracefully degrade to disk-based execution allowing very robust execution behaviour in both settings where memory is abundant and settings with memory pressure.

Other highlights are the underlying streaming engine, which supports both batch and streaming workloads and the native closed-loop iteration operators that make graph analysis and machine learning application very fast on the platform.

Flink joined the Apache Incubator in April 2014. It is a very active open-source project with more than 50 contributors from both academia and industry. Flink had already 3 major releases in 2014, with the next (0.7) planned for October.

Proposed schedule:

6-6:45pm - door open and social

6:45-8pm - talk start

8-8:30pm - closing, questions

About the Speakers:
Robert Metzger is a committer of the Apache Flink project. He joined the project in 2012 when Flink was still called “Stratosphere” as a student research assistant at TU Berlin during his Master studies.
Robert has also worked at IBM in different departments, including the Almaden Research Lab in San Jose, CA.

Ufuk Celebi is a committer of the Apache Flink project. He joined the project as a student research assistant after studying at TU Berlin and ETH Zurich. With his experience in systems engineering, Ufuk makes sure that the low-level network stacks of Flink are blazing fast.

Both speakers recently helped co-found Data Artisans, a Berlin-based startup that is committed to developing Flink further in the open source.

Photo of Silicon Valley Hands On Programming Events group
Silicon Valley Hands On Programming Events
See more events
Pivotal EMC
3495 Deer Creek Road · Palo Alto, CA