Spark at Microsoft Extravaganza

Name: Spark at Microsoft Extravaganza
Start: 2016-06-29T17:30:00-07:00
End: 2016-06-29T20:30:00-07:00
Location: Microsoft City Center (Room 2130-2150)

Hosted by Denny L. and 3 others

Meet the group

Seattle Spark+AI Meetup

No reviews yet

Details

We have three awesome sessions for our next installment of Spark at Microsoft extravaganza!

Agenda

• 5:30 pm Doors Open

• 5:30 to 6:00 pm Check-in, Food+Drinks, Networking

• 6:00 to 8:00 Three Sessions (30 to 40 minutes each)

• 8:00 to 8:30 Networking

The sessions are

Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

While building intelligent monitoring and alerting system for Office365 service quality and user experience on top of Spark Streaming, the requirement is to use event application time for the majority of our monitoring logic -mostly aggregates and temporal joins over different type of events windows for repeatability and cross signal correlation. The native Spark Streaming only supports wall-clock windowing operators, which is insufficient for most of our scenarios. Therefore Office365 team and Azure Streaming Analytics team have been working together to create a set of temporal operators (e.g. reorder, aggregate, temporal joins all by event application time) on top of Spark Streaming to fulfill our complex monitoring logic at scale. Azure Streaming Analytics team have been working for years for advanced streaming programming models and implementations while Office365 team has strong need to scale its monitoring/alerting infrastructure for service quality and user experience by leveraging open source stack (Kafka/Spark/Cassandra). During Spark Summit 2016, we presented the core concepts and streaming programming model of the temporal operators, in this talk, we will go one level deeper to analyze two different approaches of processing out of order events, reorder than process, vs. handle out of order events in the operators. We will enumerate the problems of in-memory state size and amount of computation performed, as well as the dry shard problem when using high water mark to move timeline forward. The more detailed analysis of these problems will be covered in a future talk.

Speaker: Zhong Chen, Microsoft

Spark in YARN-managed multi-tenant clusters

Spark’s YARN support allows scheduling Spark workloads on Hadoop alongside a variety of other data-processing frameworks. We will deep dive on how Spark works on yarn and why we opted on yarn as preferred cluster manager. We will give our insight on how we achieved multi-tenancy, maximizing cluster resource utilization, and while ensuring minimum resources for each application using Spark dynamic executor and Yarn schedulers on Spark HDI clusters.

Speaker: Pravin Mittal, Rajesh Iyer, Microsoft

Five Lessons Learned In Building Streaming Applications At Microsoft Bing Scale

Hundreds of millions search queries hit Bing.com every day. To enable teams in Bing to monitor and analyze user engagement, act upon revenue opportunities in markets around the world, Shared Data Team must collect logs and signals associated every single search query, process and enrich the data in near real-time. Apache Spark Streaming is the solution that empowers us to fulfill the mission. In this presentation, we will walk through top 5 lessons we learned in building and running large scale streaming applications successfully in production.

Speaker: Renyi Xiong, Microsoft

Events in Bellevue, WA

Spark at Microsoft Extravaganza

Seattle Spark+AI Meetup

Details

Members are also interested in