Skip to content

Toronto Apache Spark #11

Photo of Mehrdad Pazooki
Hosted By
Mehrdad P. and Armando B.
Toronto Apache Spark #11

Details

Experiences in Delivering Spark as a Service

Agenda:

6:30PM to 6:50PM - Opening and networking

6:50PM to 7:00PM - HackOn(Data)

7:00PM to 7:50PM - Spark Stories - Wattpad

8:00PM to 8:50PM - Experiences in Delivering Spark as a Service - IBM

8:50PM to 9:00PM - Networking

Broadcast Link:

Spark Stories

Wattpad has had a long history with Spark. We use it to quickly develop and experiment with our content discovery systems for user generated stories. In this talk we will tell you how we leverage our data and what sort of issues we face. We describe the recommendation and search algorithms we use to enable personalized and diverse content discovery and how we have integrated with Spark to deliver fast results when dealing with hundreds of millions of items.

Spakers:

Rylan Halteman (https://ca.linkedin.com/in/rylanhalteman) is a Data Engineer at Wattpad, where he designed and implemented Wattpad's internal metrics, writer analytics, and experimentation systems. He is currently focused on making data and experimentation more accessible and streamlined.

Since we're telling many Spark stories, we actually have 4 speakers. The other 3 are:

Mo Islam, Data Scientist
Joel Oren, Research Scientist
Adriel Dean-Hall, Search Architect

Experiences in Delivering Spark as a Service

The back-end architecture for the public CDS Spark service in IBM BlueMix is powered by IBM Spectrum Conductor with Spark technology. In this presentation, we will demonstrate the advantages of the architecture, which uses dynamic resource allocations based on multiple Spark tenants workload demands (vs. common cloud service architecture provisioning of pre-deployed cluster per tenant), as well as cluster's auto-scaling based on computation capacity and billing policies. We will also review some of the architectural challenges of scaling to thousand of Spark tenants in terms of performance, security requirements, data isolation and manageability.

Speakers:

Michael Feiman (https://ca.linkedin.com/in/michael-feiman-9376291) is an STSM, Product Architect for IBM Spectrum Conductor for Spark at IBM Spectrum Computing. He works on the design and architecture for HPC and BigData products, and customer solutions for large scale on-premise and cloud computing systems with a focus on scheduling, resource and workload management. He has been working in software development, product management and architecture for over 20 years. His area of interest is Big Data, specifically Spark and related projects, as well as cloud technologies. Michael has an M.Sc. in Electrical engineering.

Khalid Ahmed (https://ca.linkedin.com/in/khalidahmed4295) is an STSM, Chief Architect of Infrastructure Software at IBM Spectrum Computing. He works on the design and architecture of large scale grid and cloud computing systems with a focus on scheduling, resource, workload and data management. With over 20 years of industry experience, he has worked in a number of roles including development, product management, and architecture. His latest interests include big data systems, container technology, and data center operating system concepts. Khalid has an M.A.Sc from the University of Toronto.

Level: Intermediate/Advanced

Target Audience: Data Scientist, Data Engineer

Sponsor:

http://hackondata.com/assets/img/sponsors/wattpad.jpg

Photo of PipelineAI Advanced Spark and TensorFlow Meetup (Toronto) group
PipelineAI Advanced Spark and TensorFlow Meetup (Toronto)
See more events
Wattpad
36 Wellington Street East, Suite 200 · Toronto, ON