Baidu and Spark


Details
Agenda, detailed abstract and bio's below. We will be filming this meet-up and will post the video on the Apache Spark YouTube page. There is limited visitor parking, street parking is available.
6:30-7pm: Registration and Mingling
7-7:05: Introductions
7:05-8:15: Technical Talk
8:15-9:00: Mingling
Abstract: In this meetup, we present SparkONE: Baidu's big initiative to use Spark as the backbone of our new distributed computing platform. Our goal is to build an integrated end-to-end platform to support big data intelligent applications. To illustrate the power of SparkONE, we present an example of using SparkONE to build a CTR prediction system for our image search and monetization product .
We will explain in detail, including
-
processing multimedia data in feature extraction and ranking on Spark (by Quan Wang)
-
tackling the performance and scalability challenges in model training with our highly-opmitzed in-house Logistic Regression algorithm, and simple built model training system (by Tianbing Xu and Quan Wang)
-
our efforts on enabling Baidu’s deep learning system on Spark (by Ning Qu and Weide Zhang).
Our experiences and success story illustrate that Spark can be an efficient and versatile platform for various big data applications.
Bios:
James Peng is a Principal Architect at Baidu, where he steers the engineering direction for several divisions, including monetization platforms, infrastructure department, and data science and big data platform. The projects that he initiated and led have made significant contributions to a wide range of core products. Before joining Baidu, James was at Google Mountain View engineering team, where he has worked on various projects in the AdWords system. Prior to Google, he was a Research Associate at Stanford University, where his research was focused on distributed computing, data modeling, and large-scale databases.
Quan Wang is currently a member of the big data infrastructure team at Baidu USDC working on distributed feature extraction and model training for multimedia data based on Spark.
Weide Zhang is a Senior Architect at Baidu Inc working on big data infrastructure. Before Baidu, he had been working in various areas of system development in distributed serving systems, search infrastructures as well as machine learning in the past 7 years.
Ning Qu received his B.S from computer science department and Ph.D from Microprocessor Research and Development Center from Peking University. After that, he joined a security group in CMU, focusing on system security area, mainly about tiny hypervisor solutions to protect OS and applications. From 2009 he joined a future CPU group in Nvidia. And then in 2011 he joined Google, working on system security and cloud security projects in Production Kernel team. Since 2014, he joined Baidu USDC Infrastructure team, working on next generation distributed computing platforms.
Tianbing Xu is a member of Baidu's infrastructure group's engineering team. His work is mainly at the intersection of distributed computing and machine learning. Currently he is building simple effective training systems with highly optimized logistic regression algorithms to solve the performance and scalability challenges to achieve high prediction accuracy with small enough training time for Baidu's large scale problems.

Baidu and Spark