NOTE: This meetup is in Brussels!
For the occasion of the Hadoop Summit in Brussels, the NL-HUG will do a one-time show in Belgium. Since this is in a different setting from the usual Amsterdam meetups, talks might overlap a bit with previous events.
05.30 - 06.15: Open reception at Panoramic Hall, Level 5 at The Square Centre
06.30 - 09.00: Talks + Q&A at Studio 213, Level 2 at The Square Centre
We will have three talks:
• Hadoop – Successes, Mistakes, and Vision, by Owen O’Malley, Co-founder of Hortonworks and senior architect
I’ve been working on Hadoop since January 2006, before it was called Hadoop. Along the way, we’ve made a lot of good decisions, but we’ve also made a lot of mistakes. My talk will cover some of each and the lessons that we’ve learned as a result. Finally, I’ll share my vision of where things are going in the Hadoop ecosystem..
• Rapid Prototyping of Online Machine Learning with Divolte Collector, by Friso van Vollenhoven, CTO at GoDataDriven
• Apache Flink, by Stephan Ewen, committer on Apache Flink, co-founder @ Data Artisans
Apache Flink is a distributed data processing system that aims to unify batch and streaming data analysis at the engine level.
Flink offers powerful programming APIs In Java and Scala, based on distributed data sets and data streams, with a wide set of transformations and flexible window definitions, as well as higher-level APIs for specific use cases.
Flink backs these APIs with a robust and unique execution backend. Both batch and streaming APIs are backed by the same execution engine that has true streaming capabilities, resulting in true real-time stream processing and latency reduction in many batch programs. Flink implements its own memory manager and custom data processing algorithms inside the JVM, which makes the system behave very robustly both in-memory and under memory pressure. Flink has iterative processing built-in, implementing native iteration operators that create dataflows with feedback. Finally, Flink contains its own cost-based optimizer, type extraction, and data serialization stack.
The end result is a platform that is fast, easy to program against, unifies batch and stream processing without compromising on latency or throughput, requires very little tuning to sustain data-intensive workloads, and solves many of the problems of heavy data processing inside the JVM. The Flink project has been recently expanding to include more higher-level modules that build on top of the engine, such as a new graph processing library, a Machine Learning library, and a SQL-like interface for programming on tables rather than typed collections. Flink is integrated with the open source ecosystem, including Apache Hadoop (input/output formats, MapReduce API compatibility, and YARN integration), Apache Kafka, Apache Tez, Apache SAMOA and more. Flink is also integrated in the Google Cloud Platform.Flink is a top-level Apache project with more than 85 contributors from industry and academia.
This talk gives an overview of Flink from a user perspective both for batch and stream processing, the most important features of Flink’s runtime and their operational benefits, as well as a roadmap of the project.