Skip to content

Large scale Distributed ML on Spark

Photo of Scott Walent
Hosted By
Scott W.
Large scale Distributed ML on Spark

Details

This meetup will focus on Large scale Distributed ML on Spark. Title abstract and speaker bio below. The talk will be published on the Apache Spark (https://www.youtube.com/user/TheApacheSpark) channel on YouTube.

Agenda:

6:30 - 7:00 :: Mingling

7:00 - 7:05 :: Welcome

7:05 - 8:15 :: Technical talk

8:15 - 9:00 :: Mingling

Title: Large scale Distributed ML on Spark

Abstract: Hadoop brought MapReduce and Big Data to mainstream; however, as requirements and usage models expand, new big data analysis paradigms beyond MapReduce have inevitably emerged. In particular, there is increasing demand from organizations to discover and explore data using advanced analytics algorithms (e.g., large-scale machine learning, graph analysis, statistic modeling) for deep insights. In this talk, we will present our efforts on building large scale distributed ML on Apache Spark with many "web-scale" companies, including very complex and advanced analytics applications/algorithms (e.g., topic modeling, deep neural network, etc.), as well as massively scalable learning system/platform leveraging both application and infrastructure specific optimizations (exploring data sparsity, parameter server, etc.).

Our speaker. Jason Dai is currently the Chief Architect of Big Data Technologies at Intel. Prior to that, he was a Principle Architect in Microsoft, responsible for building large-scale Cloud and Big Data platform that powers some of the largest Internet services in the company. Before joining Microsoft, he was an Engineering Director and Principal Engineer in Intel, responsible for advanced research and development of Big Data platforms in Intel, including joint-development with UC Berkeley on the next generations of Big Data technologies (e.g., Apache Spark stack), and building next-gen Big Data platforms for some of the largest websites in the world. Jason is an internationally recognized expert on big data, cloud, parallel computing and compiler technologies. He is a PMC member of the Apache Spark project, and has published over 15 technical papers, filed over 20 patents, and taught computer classes in top universities.)

Parking location:

http://photos3.meetupstatic.com/photos/event/3/b/6/8/600_440895208.jpeg

Photo of Bay Area Spark Meetup group
Bay Area Spark Meetup
See more events
Intel Corp, SC12 Auditorium
3600 Juliette Lane · Santa Clara, CA