I am very pleased to announce our 3rd Apache Flink Meetup in Munich. This time we're happy to welcome David Anderson of data Artisans (http://data-artisans.com/), who will give a preview on Flink 1.3. The second talk by Konstantin Knauf will go into more detail about Queryable State and show how this feature can change the future of streaming architecture. Finally, Bernhard Schäfer will share some of his lessons on running Spark Streaming in production.
18.30 – Welcome, Pizaa, Beer
19.00 – Talk 1 (David Anderson): TBD (Flink 1.3)
19.30 – Talk 2 (Konstantin Knauf): Queryable State or How to Build a Billing System Without a Database
20:00 – Talk 3 (Bernhard Schäfer): 24/7 Spark Streaming on YARN in Production
20:30 – Networking, Discussion, Drinks
As always, this Meetup is free for all attendees & food and beverages are provided by our host TNG Technology Consulting GmbH. (http://www.tng.tech)
Apache Flink: What have the squirrels been up to? , David Anderson, data Artisans GmbH
David Anderson from data Artisans surveys some of the latest and greatest features in Apache Flink, and their impact for application developers.
Queryable State or How to Build a Billing System Without a Database, Konstantin Knauf, TNG Technology Consulting GmbH
Traditionally, big data applications rely on the Lambda Architecture in order to achieve low latency as well as completeness. A streaming layer provides real-time previews while a complementary batch layer retrospectively recomputes the correct results. Using a robust stream processor like Apache Flink, we can do without the latter. But can we take it even one step further? This talk will discuss one of the upcoming features of Apache Flink with the potential to do just that.
As a real-world example we have built a prototype for a robust billing system based on Flink and Queryable State. On the one hand, the system exposes the current monthly subtotals in real-time to front-end applications, on the other hand it reports the complete results to downstream systems, e.g. for invoicing. As completeness and correctness are core requirements for a billing system, we will demonstrate the system in multiple failure scenarios, including taskmanager and jobmanager failures as well as unavailability of downstream systems.
This talk will give you an idea of how "Queryable State" combined with a robust stream processor enables new streaming uses cases and changes the future of streaming application architecture.
24/7 Spark Streaming on YARN in Production, Bernhard Schäfer, inovex GmbH
For a big client in the German food retailing industry, we have been running Spark Streaming on YARN in production for more than a year. Overall, Spark Streaming has proven to be a flexible, robust and scalable streaming engine. However, one can tell that streaming itself has been retrofitted into Spark. Many of the default configurations are not suited for a 24/7 streaming application. The same applies to YARN, which was not primarily designed with long-running applications in mind. In this talk we summarize the lessons learned while running 24/7 Spark Streaming on YARN in production. In the first part we will briefly look at the use case and the architecture with HBase as the sink system. The second part details Spark's streaming aspects such as configuration, deployment, monitoring and exactly-once processing with HBase.