Stream processing with Apache Beam and Spark

Name: Stream processing with Apache Beam and Spark
Start: 2016-06-07T18:00:00+03:00
End: 2016-06-07T21:00:00+03:00
Location: Google office, 29th floor, Dodly

Hosted By

Demi B. and shlomi h.

Stream processing with Apache Beam and Spark

Details

18:00 - 18:30 - Mingling
18:30 - 19:15 - What we’ve learned (so far) from developing a stream-processing platform @ PayPal scale - Amit Sela @ PayPal
19:15 - 20:00 - Fundamentals of stream processing with Apache Beam - Tyler Akidau @ Google

http://photos3.meetupstatic.com/photos/event/b/9/c/600_450182972.jpeg

“What we’ve learned (so far) from developing a stream-processing platform @PayPal scale”

Abstract:

PayPal is the Payment industry’s leader in Risk management. Using our data, machine learning, and human detective work, we are able to

Accurately detect fraud and separate good users from bad actors - in real time at very large scale.

A year ago, we embarked on re-inventing Risk's Data platform, to support PayPal’s growth and to maintain our competitive advantage in Risk and fraud detection.

And the first component we’re releasing is how we manage data in motion – I.e. Stream processing.

What can streaming offer as a computational platform? Where are it’s strengths?

How to choose the right technology for you ? And why we chose Spark.

What were the challenges we found with stream processing ? And how we overcame some of them. What are still gaps, and how does is it all relate to “modeling” the problem of stream processing.

Where Apache Spark is going (2.0) ? And how this all comes together nicely.

Bio:

https://lh5.googleusercontent.com/iZmDPPaXlgPsmhJkzwC9-jFTZCXNdwyDJg38NmKPQPb8bHFe0-GYUkD-nChZMEGYcHsjHFQ7cKRsva86fq0YcrIjws4Ruk1WkMY_s44_ure62jiXurMRE02BNtPCKCR_BcjRVfQ3

Amit Sela (https://www.linkedin.com/in/amit-sela-7aa05035) is a Senior Software Engineer @ PayPal and a committer for Apache Beam, currently working on Risk’s next generation Big Data platform focusing on stream-processing. Amit is also an open-source enthusiast who spent the past 5 years working with Hadoop, HBase, Sqoop, Spark and Kafka, and recently got the chance to give something back to the community by working on the Spark runner for Apache Beam.

Fundamentals of stream processing with Apache Beam

Abstract:
Apache Beam (http://beam.incubator.apache.org/)(unified Batch and strEAM processing!) is a new Apache incubator project. Originally based on years of experience developing Big Data infrastructure within Google (such as MapReduce, FlumeJava, and MillWheel), it has now been donated to the OSS community at large.

Come learn about the fundamentals of out-of-order stream processing, and how Beam’s powerful tools for reasoning about time greatly simplify this complex task. Beam provides a model that allows developers to focus on the four important questions that must be answered by any stream processing pipeline:

What results are being calculated?

Where in event time are they calculated?

When in processing time are they materialized?

How do refinements of results relate?

Furthermore, by cleanly separating these questions from runtime characteristics, Beam programs become portable across multiple runtime environments, both proprietary (e.g., Google Cloud Dataflow) and open-source (e.g., Flink, Spark, et al).

Bio:

https://lh4.googleusercontent.com/YEYUdUrjzvTVbVj3E-Z9peFcJbcEq5UVGekhFoVgO4IUCEUc1XiVPeL7OOWZgCguETGEHii1OrbmgNz80h0qWoFqWyxr3fDyY1gNbWS-x98eJHOW0RdZrZBO9UmeTcYVIZcEXiOx

Tyler Akidau (https://www.linkedin.com/in/tyler-akidau-5221672) is a Staff Software Engineer @ Google. The current tech lead for internal streaming data processing systems (e.g. MillWheel), he’s spent six years working on massive-scale streaming data processing systems. He passionately believes in streaming data processing as the more general model of large-scale computation.

Events in Tel Aviv-Yafo, IL

Big Things

See more events

Big Things

Tuesday, June 7, 2016
6:00 PM to 9:00 PM IDT

Google office, 29th floor, Dodly

Yigal Alon 98 · Tel Aviv-Yafo

Big Things

public group

Stream processing with Apache Beam and Spark