Building Big Data applications with Apache Beam and Apache Apex (native Hadoop)


Details
The event will cover - Building streaming applications with Apache Beam and Apache Apex.
Apache Apex is a native Hadoop data in motion platform that is being used by customers for both streaming as well as batch processing. Common use cases include ingestion into Hadoop, streaming analytics, ETL, database off-loads, alerts and monitoring, machine model scoring, etc.
Agenda:
5:45pm - Food & Network
6:15pm - Kenn Knowles, Software Engineer, Google & Apache Beam (incubating) PPMC member
7:00pm - Q&A
7:15pm - Siyuan Hua, Apache Apex Committer & DataTorrent Engineer
8:00pm - Food & Network
Abstract:
Talk 1: Apache Beam (incubating) is a programming model and library for unified batch & streaming big data processing. This talk will cover the Beam programming model broadly, including its origin story and vision for the future. We will dig into how Beam separates concerns for authors of streaming data processing pipelines, isolating what you want to compute from where your data is distributed in time and when you want to produce output. Time permitting, we might dive deeper into what goes into building a Beam runner, for example atop Apache Apex.
Talk 2: Apache Apex provides a DAG construction API that gives the developers full control over the logical plan. Some use cases don't require all of that flexibility, at least so it may appear initially. Also a large part of the audience may be more familiar with an API that exhibits more functional programming flavor, such as the new Java 8 Stream interfaces and the Apache Flink and Spark-Streaming API. Thus, to make Apex beginners to get simple first app running with familiar API, we are now providing the Stream API on top of the existing DAG API. The Stream API is designed to be easy to use yet flexible to extend and compatible with the native Apex API. This means, developers can construct their application in a way similar to Flink, Spark but also have the power to fine tune the DAG at will. Per our roadmap, the Stream API will closely follow Apache Beam (aka Google Data Flow) model. In the future, you should be able to either easily run Beam applications with the Apex Engine or express an existing application in a more declarative style.
For deeper engagement with Apache Apex (http://apex.apache.org/)- follow ApacheApex (https://twitter.com/apacheapex), presentations (http://www.slideshare.net/ApacheApex), recordings (https://www.youtube.com/user/datatorrent), download (community (https://www.datatorrent.com/download/datatorrent-community-edition-download-meetups/), sandbox (https://www.datatorrent.com/download/datatorrent-rts-sandbox-edition-download-meetups/)), Apache Apex releases (http://apex.apache.org/downloads.html), docs (http://apex.apache.org/docs.html)

Building Big Data applications with Apache Beam and Apache Apex (native Hadoop)