veScale- A PyTorch-Native Auto-Parallel AI Framework for Ease of Use

Hosted By
Sujata T.

Details
veScale An Industrial-Level Framework for Easy-of-Use
- π₯ PyTorch Native: veScale is rooted in PyTorch-native data structures, operators, and APIs, enjoying the ecosystem of PyTorch that dominates the ML world.
- π‘ Zero Model Code Change: veScale decouples distributed system design from model architecture, requiring near-zero or zero modification on the model code of users.
- π Single Device Abstraction: veScale provides single-device semantics to users, automatically distributing and orchestrating model execution in a cluster of devices.
- π― Automatic Parallelism Planning: veScale parallelizes model execution with a synergy of strategies (tensor, sequence, data, ZeRO, pipeline parallelism) under semi- or full-automation [coming soon].
- β‘ Eager & Compile Mode: veScale supports not only Eager-mode automation for parallel training and inference but also Compile-mode for ultimate performance [coming soon].
- π Automatic Checkpoint Resharding: veScale manages distributed checkpoints automatically with online resharding across different cluster sizes and different parallelism strategies.
Ziang Song: Ziang Song is a research scientist at ByteDance LLM team. He specializes in scaling up large-scale distributed training systems for large language models and multimodal models. He is one of the founding members of veScale, ByteDance's PyTorch-native distributed training framework. Before joining ByteDance, Ziang was a researcher at CMU with close collaboration with Microsoft Research and JHU-CLSP in both ML algorithms and distributed systems.

Open Source Development
See more events
Open Source Development

No ratings yet
Online event
This event has passed
veScale- A PyTorch-Native Auto-Parallel AI Framework for Ease of Use