Learn About Stream & Batch Processing with Apache Beam

This is a past event

135 people went

Location image of event venue

Details

Join Cloud Mafia for another great tech talk!

This event is free to attend, please RSVP!

Agenda:

6:00pm-6:30pm Networking, food & drinks!

6:30pm-7:30pm Tech talks

7:30pm-8:30pm Q&A and networking

Talk #1

Realizing the Promise of Portability with Apache Beam

Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust batch and stream data processing applications, in a variety of languages, across a variety of platforms. By cleanly separating the user’s processing logic from details of the underlying execution engine, the same pipelines will run on any Apache Beam runtime environment, whether it’s on-premises or in the cloud, on open source frameworks like Apache Spark or Apache Flink or on managed services like Google Cloud Dataflow.We will provide an overview of the basic data processing concepts in Beam, including the APIs for robust stream data handling. We will also provide an overview of the Beam portability API and how it enables job execution across different processing engines, as well as SDKs in different languages, and the benefits this provides.

Anand Iyer, Product Manager @ Google -

Anand Iyer is a Product Manager at Google, focusing on making Google's industry leading big-data infrastructure available to the world via Google Cloud. He is passionate about delivering tools and platforms that make it easy to derive insights from massive volumes of data.

Talk #2

Exactly Once Stream Processing in Google Cloud Dataflow

The popularity of stream data platforms is growing fast. Several companies are transitioning parts of their data infrastructure to a streaming paradigm so that they have access to real-time information and can derive insights from their data in real-time. To author mission critical applications in a streaming paradigm, it is essential to have a system that guarantees "exactly once semantics". However, many general purpose streaming engines either fail to provide this guarantee, or put the burden of achieving exactly once semantics on the application author. However, with Google Cloud Dataflow, customers can author streaming pipelines with end-to-end exactly once semantics available out of the box. In our talk we will start by defining exactly once semantics and illustrate why it is essential. Then we will describe how Google Cloud Dataflow provides this guarantee, out of the box, without compromising any performance.

Ahmet Altay, Python SDK Tech Lead

Ahmet Altay is a software engineer working on Google Cloud Dataflow and a PMC member of Apache Beam. He loves all the users that makes data processing fun and passionate about improving their experiences.

*Food and drinks provided by Google