Skip to content

veScale- A PyTorch-Native Auto-Parallel AI Framework for Ease of Use

Photo of Sujata Tibrewala
Hosted By
Sujata T.
veScale- A PyTorch-Native Auto-Parallel AI Framework for Ease of Use

Details

veScale An Industrial-Level Framework for Easy-of-Use

  • πŸ”₯ PyTorch Native: veScale is rooted in PyTorch-native data structures, operators, and APIs, enjoying the ecosystem of PyTorch that dominates the ML world.
  • πŸ›‘ Zero Model Code Change: veScale decouples distributed system design from model architecture, requiring near-zero or zero modification on the model code of users.
  • πŸš€ Single Device Abstraction: veScale provides single-device semantics to users, automatically distributing and orchestrating model execution in a cluster of devices.
  • 🎯 Automatic Parallelism Planning: veScale parallelizes model execution with a synergy of strategies (tensor, sequence, data, ZeRO, pipeline parallelism) under semi- or full-automation [coming soon].
  • ⚑ Eager & Compile Mode: veScale supports not only Eager-mode automation for parallel training and inference but also Compile-mode for ultimate performance [coming soon].
  • πŸ“€ Automatic Checkpoint Resharding: veScale manages distributed checkpoints automatically with online resharding across different cluster sizes and different parallelism strategies.

Ziang Song: Ziang Song is a research scientist at ByteDance LLM team. He specializes in scaling up large-scale distributed training systems for large language models and multimodal models. He is one of the founding members of veScale, ByteDance's PyTorch-native distributed training framework. Before joining ByteDance, Ziang was a researcher at CMU with close collaboration with Microsoft Research and JHU-CLSP in both ML algorithms and distributed systems.

Photo of Open Source Development group
Open Source Development
See more events
Online event
This event has passed