Distributed Training with Apache MXNet using Horovod


Details
Abstract:
Data-parallelized distributed training on GPU machines enables training modern deep neural networks on large datasets within minutes to hours, making this training technique paramount for practical use of deep learning. However, adapting a single-GPU deep learning model to work in a distributed environment for data parallelism is not a trivial task for users. Users must consider not only what data will be shared and synchronized between model replicas and what communication method will be used, but also how computation and data will be distributed across multiple GPUs and machines.
Apache MXNet is a deep learning framework that supports both single instance and distributed training and inferences. Horovod is an open source distributed deep learning framework built with high performance communication primitive using NVIDIA Collective Communications Library (NCCL). Horovod currently supports Tensorflow, MXNxet, PyTorch and Keras.
In this talk, we will introduce to you some basic concept of distributed training and demonstrate how to perform distributed training using MXNet with Horovod and achieve really good performance results.
Speaker info:
Lin Yuan
Lin is a Software Engineer at AWS building performant and easy-to-use deep learning framework. His recent focus is on MXNet, an Apache Incubator project in which he is a committer. He works tirelessly to improve its performance in both distributed and single instance applications.
Yuxi (Darren) Hu
Yuxi is a Software Development Engineer in MXNet Engineering team at AWS AI. He is an active contributor to both Apache MXNet and Horovod projects. He is passionate about making MXNet the most performant and easy-to-use framework for engineers and scientists to develop deep learning applications.

Distributed Training with Apache MXNet using Horovod