Skip to content

Apache Flink: Fast and Reliable Large-scale Data Processing

Apache Flink: Fast and Reliable Large-scale  Data Processing

Details

Schedule:

6:30 pm - 7:00 pm Networking & Refreshments

7:00 pm - 8:15 pm Speakers' Presentation

8:15 pm - 8:30 pm Q & A

Join us for our first kick off Meetup at Hortonworks' new HQ in Santa Clara!

The meetup will be held in the Mojave Training Room, adjacent to the Tech Stadium Cafe. Follow the parking instruction below for where to park and how to get to the building.

Maximum Capacity: 60 Attendees

Session Description:

This talk presents Apache Flink (http://flink.incubator.apache.org/) from a user's perspective. We introduce the APIs and highlight the most interesting design points behind Flink, discussing how they contribute to the goals of performance, robustness, and flexibility. We finally give an outlook on Flink’s development roadmap.

Abstract:

Apache Flink (http://flink.incubator.apache.org/) is one of the latest addition to the Apache family of data processing engines. In short, Flink’s design aims to be as fast as in-memory engines, while providing the reliability of Hadoop.

Flink contains:

• (1) APIs in Java and Scala for both batch-processing and data streaming applications,

• (2) A translation stack for transforming these programs to parallel data flows

• (3) A runtime that supports both proper streaming and batch processing for executing these data flows in large compute clusters.

Flink’s batch APIs build on functional primitives (map, reduce, join, cogroup, etc), and augment those with dedicated operators for iterative algorithms, and support for logical, SQL-like key attribute referencing (e.g., groupBy(“WordCount.word”). The Flink streaming API extends the primitives from the batch API with flexible window semantics.

Internally, Flink transforms the user programs into distributed data stream programs. In the course of the transformation, Flink analyzes functions and data types (using Scala macros and reflection), and picks physical execution strategies using a cost-based optimizer. Flink’s runtime is a true streaming engine, supporting both batching and streaming. Flink operates on a serialized data representation with memory-adaptive out-of-core algorithms for sorting and hashing. This makes Flink match the performance of in-memory engines on memory-resident datasets, while scaling robustly to larger disk-resident datasets.

Finally, Flink is compatible with the Hadoop ecosystem. Flink runs on YARN, reads data from HDFS and HBase, and supports mixing existing Hadoop Map and Reduce functions into Flink programs. Ongoing work is adding Apache Tez as an additional runtime backend.

Speaker Bio:

Kostas Tzoumas is a committer at Apache Flink and co-founder of data Artisans (data-artisans.com (http://data-artisans.com/)), a Berlin-based company that is developing and contributing to Apache Flink. Before founding data Artisans, Kostas was a postdoctoral researcher at TU Berlin, received a PhD in Computer Science from Aalborg University and has been with the University of Maryland, College Park, and Microsoft Research in Redmond in the course of several internships.

Stephan Ewen is a committer at Apache Flink and co-founder of data Artisans (data-artisans.com (http://data-artisans.com/)), a Berlin-based company that is developing and contributing to Apache Flink. Before founding data Artisans, Stephan was leading the development of Flink since the early days of the project (then called Stratosphere) at TU Berlin. Stephan has a PhD in Computer Science from TU Berlin, and has been with IBM Almaden Research and the Microsoft Research in the course of several internships.

PARKING INSTRUCTIONS:

When you arrive, drive along the driveway, past the garage, past the Dell building, to the large parking lot. To the far right, along the grass bank, are parking slots reserved for “HW” (Hortonworks). Look for an open “HW” spot and park you car. Or ask the valet in red jacket who will guide you where to park. Tell a valet you're here for a Meetup.

From the parking lot, walk back to the Tech Stadium Café, up the short stairs, and adjacent to the Tech Café Stadium, is an outside entrance to Mojave Conference Learning Center. (You’ll notice “Hortonworks Shuttle Pick Up and Drop” signs a the bottom of the short stairs.)

See you there.

Photo of Apache Tez User Group group
Apache Tez User Group
See more events
Hortonworks HQ
5470 Great America Parkway · Santa Clara, CA