Compiler Optimizations for CPU-GPU Heterogeneous Systems[Asia Time Friendly]
Details
Modern high-performance applications increasingly rely on CPU-GPU heterogeneous systems, yet their performance is often limited by poorly placed global synchronization barriers, synchronous memory transfers, and default-stream CUDA semantics that prevent overlap between computation and communication. This talk presents compiler-driven techniques to systematically remove these bottlenecks. I will first introduce hetero-sync motion, which safely relocates barrier instructions to enable greater CPU-GPU concurrency, and sync2async, which automatically transforms synchronous data transfers and kernel launches on the default stream into non-default-stream asynchronous calls with correct stream allocation and synchronization. Both techniques rely on precise, context-sensitive, flow-sensitive inter-procedural data-flow analyses implemented in LLVM/Clang, and deliver significant speedups on modern GPUs.
I will briefly touch upon our ongoing work on optimizing Unified Memory programs using static analysis to reduce unnecessary on-demand page migrations. Together, these efforts show how compiler analysis can unlock concurrency and efficiency in heterogeneous systems without increasing programmer burden.
Note :
1. this meetup is designed to be Asia Time Zone–Friendly.
2. this meetup will be RECORDED
3. The event is open to participants of all genders.
