Skip to content

Details

cuda-oxide: Bringing CUDA to Rust with a custom rustc backend

Abstract:

Using a GPU to accelerate a Rust application has a subpar developer experience today: the moment you need a kernel, you have to leave Rust. The kernel goes into a separate `.cu` file built by `nvcc`, becomes an opaque PTX blob, and is launched from the host, where you mostly just hope the host-side Rust types match the kernel's. The cost is steep: no type checking across the host/device boundary, none of Rust's safety and language guarantees on the kernel itself, no IDE support, and no single cargo build. You give up the Rust toolchain right where you'd most want it. cuda-oxide closes that gap by keeping the kernel in Rust: a `#[kernel]` function is ordinary Rust, type-checked and borrow-checked by `rustc` and lowered to PTX by a custom codegen backend, with host and device code in the same crate and the same build. Host code gets the full Rust ecosystem, and because the kernel is no_std, any crate that only depends on core works there too.

This talk is about how that works, and it leans on the design rather than the plumbing: why we chose to build a custom `rustc` backend, how Rust's type system can be leveraged to build safe GPU abstractions (thread indices as proofs of uniqueness, race-free parallel slices, scoped atomics), and how the principle "use the best tool for each stage, own the whole pipeline" plays out across `rustc`, `pliron`, and `LLVM's NVPTX` backend. I'll show what runs today, including a real tensor-core GEMM on Blackwell written entirely in Rust, and, if time permits, how existing CUDA C++ fits in through device FFI.

Related topics

Events in Bengaluru
Distributed Systems
Programming Languages
Compilers

You may also like