Skip to content

Spark Streaming @ Expedia and Implementing Spark Streaming Connector

Photo of Denny Lee
Hosted By
Denny L. and 2 others
Spark Streaming @ Expedia and Implementing Spark Streaming Connector

Details

For March we have a Spark Streaming theme with 2 technically-oriented talks. Thank you Brandon and Expedia for hosting and feeding us at this event!

Agenda

6:00-6:30 Food, drinks, social!

6:30-6:35 Welcome and Logistics

6:35-7:35 Main event: Spark Streaming for production analytics systems at Expedia by Brandon O'Brien

7:35-8:05 Implementing Spark Streaming Connector for Azure Event Hubs and HDInsight by Arijit Tarafdar, Nan Zhu

Spark Streaming for production analytics systems at Expedia by Brandon O'Brien

At Expedia we use Spark Streaming for several of our streaming analytics systems in production, for multiple use cases. In this presentation, I'll share some of the best practices we've developed for running Spark Streaming in production, primarily to address concerns such as performance, stability and monitoring.

Topics included:

  • Spark Streaming overview and standalone clusters
  • Design patterns for performance
  • Spark cluster and app stability
  • Direct kafka integration
  • Guaranteed message processing
  • Operational monitoring

About Brandon

Brandon O’Brien is a Principal Software Engineer at Expedia who is leveraging Spark and other related technologies to build large-scale streaming processing systems for travel market analytics. Contact:
https://twitter.com/hakczar (https://twitter.com/hakczar)
https://www.linkedin.com/in/brandonjobrien (https://www.linkedin.com/in/brandonjobrien)

Implementing Spark Streaming Connector for Azure Event Hubs and HDInsight by Arijit Tarafdar, Nan Zhu

One of the biggest challenges in data science is to build a continuous data application which delivers results rapidly and reliably. Spark Streaming offers a powerful solution for real-time data processing. However, the challenge remains in how to connect them with various continuous and real-time data sources, guaranteeing the responsiveness and reliability of data applications.

In this talk, we will summarize our experiences learned from serving the real-time Spark-based data analytic solutions on Azure HDInsight ( https://azure.microsoft.com/en-us/services/hdinsight (https://azure.microsoft.com/en-us/services/hdinsight)/). Our solution seamlessly integrates Spark and Azure EventHubs which is a hyper-scale telemetry ingestion service enabling users to ingress massive amounts of telemetry into the cloud and read the data from multiple applications using publish-subscribe semantics.( https://github.com/hdinsight/spark-eventhubs (https://github.com/hdinsight/spark-eventhubs)).

We will cover three topics: bridging the gap of data communication model in Spark & data source, accommodating Spark to rate control and message addressing of data source and the co-design of fault tolerance Mechanisms. We expect that this talk will share the insights on how to build continuous data applications with Spark and boost more availabilities of connectors for Spark and different real-time data sources.

Arijit Tarafdar, Nan Zhu - Software Engineers, Spark @ Microsoft

Photo of Seattle Spark+AI Meetup group
Seattle Spark+AI Meetup
See more events
Expedia Buidling
333 108th Ave NE · Bellevue , WA