Skip to content

"Machine Learning at the Limit" By Prof. John Canny

Photo of Chester Chen
Hosted By
Chester C. and Sara A.
"Machine Learning at the Limit" By Prof. John Canny

Details

We are very excited to launch our new meetup with Prof. John Canny from UC. Berkeley. This talk will be a joint meetup with SF machine learning. Following are the details of his talk.

Machine Learning at the Limit

John Canny, UC Berkeley

How fast can machine learning and graph algorithms be? In "roofline" design, every kernel is driven toward the limits imposed by CPU, memory, network etc. This can lead to dramatic improvements: BIDMach is a toolkit for machine learning that uses rooflined design and GPUs to achieve two- to three-orders of magnitude improvements over other toolkits on single machines. These speedups are larger than have been reported for cluster systems (e.g. Spark/MLLib, Powergraph) running on hundreds of nodes, and BIDMach with a GPU outperforms these systems for most common machine learning tasks. For algorithms (e.g. graph algorithms) which do require cluster computing, we have developed a rooflined network primitive called "Kylix". We can show that Kylix approaches the rooline limits for sparse Allreduce, and empirically holds the record for distributed Pagerank. Beyond rooflining, we believe there are great opportunities from deep algorithm/hardware codesign. Gibbs Sampling (GS) is a very general tool for inference, but is typically much slower than alternatives. SAME (State Augmentation for Marginal Estimation) is a variation of GS which was developed for marginal parameter estimation. We show that it has high parallelism, and a fast GPU implementation. Using SAME, we developed a GS implementation of Latent Dirichlet Allocation whose running time is 100x faster than other samplers, and within 3x of the fastest symbolic methods. We are extending this approach to general graphical models, an area where there is currently a void of (practically) fast tools. It seems at least plausible that a general-purpose solution based on these techniques can closely approach the performance of custom algorithms.

Bio

John Canny is a professor in computer science at UC Berkeley. He is an ACM dissertation award winner and a Packard Fellow. He is currently a Data Science Senior Fellow in Berkeley's new Institute for Data Science and holds a INRIA (France) International Chair. Since 2002, he has been developing and deploying large-scale behavioral modeling systems. He designed and protyped production systems for Overstock.com, Yahoo, Ebay, Quantcast and Microsoft. He currently works on several applications of data mining for human learning (MOOCs and early language learning), health and well-being, and applications in the sciences.

Agenda:

6:15 pm -- door opens/check-in

6:15 - 6:50 pm -- social and networking

6: 50 -- 6:55 pm -- Announcement

6:55 -- 7:05 pm -- host introduction

7:05 -- 8:15 pm -- speaker talk

8:15 pm - 8:30 pm -- Q&A

8:45 pm -- 9 pm - end and office close

Photo of SF Big Analytics group
SF Big Analytics
See more events
Alpine Data Labs
1550 Bryant Street · San Francisco, CA