Distributed Stream and Graph Processing with Apache Flink


Details
In this meetup we will present talks about real streaming an graph processing with Apache Flink. Also will present research and future projects related to Apache Flink.
Please join us on this great opportunity to learn more about Apache Flink.
Thank you so much for MapR for hosting and providing food and drinks for this meetup.
Apache Flink community update - Henry
Will present the current community updates for Apache Flink including releases, community growth, new features, adoptions, and meetups happening.
Stateful distributed stream processing - Gyula
More complex streaming applications generally need to store some state of the running computations in a fault-tolerant manner. This talk discusses the concept of operator state and compares state management in current stream processing frameworks such as Apache Flink Streaming, Apache Spark Streaming, Apache Storm and Apache Samza.
We will go over the recent changes in Flink streaming that introduce a unique set of tools to manage state in a scalable, fault-tolerant way backed by a lightweight asynchronous checkpointing algorithm.
When Micro-batching Isn't Good Enough - Ted
I will describe several use cases where batch and micro batch processing is not appropriate and describe why this is so.
I will also describe what a true streaming solution needs to provide for solving these problems.
These use cases will be taken from real industrial situations, but the descriptions will drive down to technical details as well.
Large-Scale Graph Analysis with Apache Flink - Vasia
This talk will give an ove rview of Flink’s Graph processing API, Gelly. We will discuss how iterative operators and other unique features of Flink make it a competitive alternative for large-scale graph processing. We will show how one can elegantly express graph analysis tasks, using common Flink operators and how different graph processing models, like vertex-centric and gather-sum-apply, can be easily mapped to Flink dataflows.
Looking Forward - Apahe Flink research preview/roadmap - Paris
In this talk we will give a broad preview of the research behind Flink and the fundamentals we are working on to make Flink a use-case complete system for data processing. The topics we will cover will be incremental checkpointing, windowing optimisations, machine learning pipeline design and streaming graph analytics on Flink.
Finally, we will introduce Karamel (karamel.io (http://karamel.io/)), a new way of deploying and reproducing experiments with Flink on EC2, GCC, Openstack and bare metal.
Proposed Schedule
6pm-6:30pm Door open and socialize
6:30pm-8:30pm Talks
Bio of Speakers
• Henry Saputra
Henry is a PMC member for the Apache Flink and also member of the Apache Software Foundation. Henry also member of Apache Incubator PMC and former mentor of Apache Flink while still in incubation.
Currently Henry is working on distributed systems and big data application platforms.
• Gyula Fóra
Gyula is a PMC member for the Apache Flink project, currently working as a researcher at the Swedish Institute of Computer Science. His main expertise and interest is real-time distributed data processing frameworks, and their connections to other big data applications. He is a core architect of Apache Flink Streaming. His current work includes research and development on several aspects of stream processing, including fault-tolerance, efficient windowing computations and streaming machine learning.
• Vasia Kalavri
Vasia Kalavri is a PhD student at KTH, Stockholm, doing research on distributed data processing, systems optimization and large-scale graph analysis. She is also a PMC member of Apache Flink, mainly working on Flink's graph processing API, Gelly.
• Paris Carbone
Paris is a PhD student in distributed computing at the Royal Institute of Technology in Sweden and a committer for Apache Flink. His work is focused on abstractions, domain specific languages and architectures towards scalable, expressive and cost-effective distributed data stream processing.
• Ted Dunning - Chief Application Architect, MapR
Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects . Ted has been very active in mentoring new Apache projects and is currently serving as vice president of incubation for the Apache Software Foundation . Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems. He built fraud detection systems for ID Analytics (LifeLock) and he has 24 patents issued to date and a dozen pending. Ted has a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting.

Distributed Stream and Graph Processing with Apache Flink