Spark Streaming @ Expedia and Implementing Spark Streaming Connector

Name: Spark Streaming @ Expedia and Implementing Spark Streaming Connector
Start: 2017-03-23T18:00:00-07:00
End: 2017-03-23T21:00:00-07:00
Location: Expedia Buidling

Hosted by Denny L. and 2 others

Seattle Spark+AI Meetup

Details

For March we have a Spark Streaming theme with 2 technically-oriented talks. Thank you Brandon and Expedia for hosting and feeding us at this event!

Agenda

6:00-6:30 Food, drinks, social!

6:30-6:35 Welcome and Logistics

6:35-7:35 Main event: Spark Streaming for production analytics systems at Expedia by Brandon O'Brien

7:35-8:05 Implementing Spark Streaming Connector for Azure Event Hubs and HDInsight by Arijit Tarafdar, Nan Zhu

Spark Streaming for production analytics systems at Expedia by Brandon O'Brien

At Expedia we use Spark Streaming for several of our streaming analytics systems in production, for multiple use cases. In this presentation, I'll share some of the best practices we've developed for running Spark Streaming in production, primarily to address concerns such as performance, stability and monitoring.

Topics included:

Spark Streaming overview and standalone clusters
Design patterns for performance
Spark cluster and app stability
Direct kafka integration
Guaranteed message processing
Operational monitoring

About Brandon

Brandon O’Brien is a Principal Software Engineer at Expedia who is leveraging Spark and other related technologies to build large-scale streaming processing systems for travel market analytics. Contact:
https://twitter.com/hakczar (https://twitter.com/hakczar)
https://www.linkedin.com/in/brandonjobrien (https://www.linkedin.com/in/brandonjobrien)

Implementing Spark Streaming Connector for Azure Event Hubs and HDInsight by Arijit Tarafdar, Nan Zhu

One of the biggest challenges in data science is to build a continuous data application which delivers results rapidly and reliably. Spark Streaming offers a powerful solution for real-time data processing. However, the challenge remains in how to connect them with various continuous and real-time data sources, guaranteeing the responsiveness and reliability of data applications.

In this talk, we will summarize our experiences learned from serving the real-time Spark-based data analytic solutions on Azure HDInsight ( https://azure.microsoft.com/en-us/services/hdinsight (https://azure.microsoft.com/en-us/services/hdinsight)/). Our solution seamlessly integrates Spark and Azure EventHubs which is a hyper-scale telemetry ingestion service enabling users to ingress massive amounts of telemetry into the cloud and read the data from multiple applications using publish-subscribe semantics.( https://github.com/hdinsight/spark-eventhubs (https://github.com/hdinsight/spark-eventhubs)).

We will cover three topics: bridging the gap of data communication model in Spark & data source, accommodating Spark to rate control and message addressing of data source and the co-design of fault tolerance Mechanisms. We expect that this talk will share the insights on how to build continuous data applications with Spark and boost more availabilities of connectors for Spark and different real-time data sources.

Arijit Tarafdar, Nan Zhu - Software Engineers, Spark @ Microsoft

Seattle Spark+AI Meetup

Spark Streaming @ Expedia and Implementing Spark Streaming Connector

Seattle Spark+AI Meetup

Details

Related topics

You may also like