Skip to content

Spark and Verizon

Photo of Scott Walent
Hosted By
Scott W.
Spark and Verizon

Details

We will be filming the meetup for later posting on the Apache Spark youtube page.

Pizza and soft drinks will be served.

Please bring a photo ID to enter the building.

Agenda:

6:30 - 7:00 :: Mingling

7:00 - 7:05 :: Welcome

7:05 - 8:15 :: Technical talk

8:15 - 9:00 :: Mingling

In this meetup, we will have a series of talks on how Spark is being used at Verizon’s two development groups: Verizon Lab Big Data Analytics (BDA) and OnCue.

  1. Online file processing using Apache Spark (BDA)Apache Spark has been rapidly adopted into production systems for its expressive and fast data processing features. In this talk, we will introduce one area where Verizon is using Spark for mission-critical data ingestion at scale and focus on code patterns that lead to stable, long-term operation. The talk will cover the complexities of implementing a multi-threaded driver, exception handling, and graceful shutdowns.

  2. ML algorithms on large scale similarity computation and matrix factorization (BDA)In this talk we will discuss two areas of developments related to Spark MLlib. 1) MLlib introduced columnSimilarity to find similar columns for tall skinny matrices. Here we introduce rowSimilarity in MLlib IndexedRowMatrix to improve the stability and runtime when the matrix gets wider (> 1 MM columns). The implementation also supports pluggable distance function such as cosine, Euclidean and RBF, and therefore can be used in other kernel based algorithms. We will present the runtime analysis of column and row similarity flows and its application to K Nearest Neighbor calculation and content based user->item recommendation. 2) We will discuss some extension we made over MLlib's ALS based matrix factorization to handle positivity and simplex constraints using Breeze QuadraticMinimizer. We will demonstrate the runtime impact of adding constraints to ALS and it’s application to automatic segment generation.

  3. Lessons on using Spark and Mesos (OnCue)Building a next generation TV service over the top from the ground up presents several challenges. On this talk we will be focusing on the story behind handling the complexity in building data pipelines, and how Spark on Mesos has been an answer to this problem. We will discuss our experiences in integrating Spark on Mesos, as well as optimizing performance and resource utilization on our clusters. We will also go over some gotchas on our systems from our own development on analytics, recommendations, and advertising applications.

Photo of Bay Area Spark Meetup group
Bay Area Spark Meetup
See more events
Verizon
375 W Trimble Rd (Garage area) · San Jose, CA