Skip to content

Details

Please register at our event partner AICamp site https://www.aicamp.ai/event/eventdetails/W2024042512 to obtain Zoom Link.

Agenda
12:00 pm PT -- 12:05 pm PT join online
12:05 pm PT -- 12:40 pm PT Talk 1 (Google)
12:40 pm PT -- 13:15 pm PT Talk 2 (NVidia)
13:15 pm PT -- 13:50 pm PT Talk 3 (Apple)

Talk 1: Federated Learning with Public and Private Data: From Small Models to Large, and Back

In this talk, we will discuss federated learning in the cross-device and limited client resources settings. We consider the implication of model sizes and the usage of public and private data for practical applications. For on-device models in production, we train language models (LMs) with federated learning (FL) and differential privacy (DP) in the Google Keyboard (Gboard). We apply the DP-Follow-the-Regularized-Leader (DPFTRL) algorithm to achieve meaningfully formal DP guarantees without requiring uniform sampling of client devices. To provide favorable privacy-utility trade-offs, we introduce a new client participation criterion and discuss the implication of its configuration in large scale systems. With the help of pretraining on public data, we train and deploy more than twenty Gboard LMs that achieve high utility and ρ−zCDP privacy guarantees with ρ ∈ (0.2, 2), with two models additionally trained with secure aggregation. We are happy to announce that all the next word prediction neural network LMs in Gboard now have DP guarantees, and all future launches of Gboard neural network LMs will require DP guarantees. By applying the best practices, carefully choosing the configuration, and the new Matrix Factorization based (MF-)DP-FTRL algorithms, we trained and launched two production Gboard language models with strong DP guarantees of 0.0144-zCDP and alternatively (ε=0.994, δ=1e-10)-DP. As far as we know, this is the first production machine learning model announced with ε < 1. We further discuss a few research explorations to increase the model size, and how very large models interact with on-device applications.

Speaker: Zheng Xu ( Google)

Zheng Xu is a research scientist working on federated learning and privacy at Google. He earned his Ph.D. in optimization and machine learning from University of Maryland, College Park, in 2019. Before that, he got his Master's and Bachelor's degree from University of Science and Technology of China.

Talk 2 : Federated Learning: Towards Real-world Studies

This talk will cover an end-to-end discussion for applying federated learning (FL) in real world studies – from theoretical algorithm design to practical framework implementation. Given the fundamentals of FL addressing the pivotal balance between data privacy and the collaborative enhancement of machine learning (ML) models, we will discuss the special challenges and solutions for embedding FL in AI model development. Specifically, we will talk about practical frameworks regarding system design and implementations, and further discuss the systematic requirements and features, especially in the age of LLMs. Ultimately, this talk underscores the transformative potential of FL in a general setting beyond deep learning, offering insights into its current achievements and future possibilities.

Speaker: Ziyue Xu ( Nvidia)

Ziyue Xu is a senior scientist at NVIDIA. His research interests are in the area of image analysis and computer vision with applications in biomedical and clinical imaging. He has been working on medical AI since 2007 along with fellow researchers and clinicians. He is an IEEE Senior Member and Associate Editor for the journals IEEE Transactions on Medical Imaging (TMI), Journal of Biomedical and Health Informatics (JBHI), Computerized Medical Imaging and Graphics (CMIG), and Computers in Biology and Medicine (CBM).

Talk 3: pfl-research simulation framework for accelerating research in Private Federated Learning (PFL)

For this talk: No live Q&A, No Video Recording

Federated Learning (FL) is an emerging Machine Learning (ML) paradigm where training data stays private on user/edge devices and aggregated gradients help us learn from a population. Differential Privacy (DP) is considered the industry standard for a rigorous privacy guarantee over statistics and model gradients. Large scale deployments of FL and DP are challenging but growing due to increased privacy awareness and regulations. On behalf of Apple, we introduce pfl-research, an open-source framework for PFL simulations, with a goal to equip and accelerate the research community, and improve reproducibility of FL dataset and algorithm benchmarks.

Speaker Mona Chitnis (Apple)

Mona leads an engineering team in the Privacy Preserving ML space at Apple, with a focus on Federated Learning. Prior to Privacy, Mona was technical lead on several data engineering teams at Apple, and Apache open source committer on Hadoop team at Yahoo.

Related topics

Machine Learning
Text Analytics
Hadoop
Big Data
Data Analytics

You may also like