In-Network Machine Learning for Time-Sensitive Applications
Details
Abstract: This talk explores the use of in-network machine learning as a new paradigm for accelerating time-sensitive ML tasks. By offloading inference workloads to programmable network devices, in-network ML leverages underutilized compute resources within the data plane, achieving significant performance and energy-efficiency gains. Building on this idea, the talk presents a general methodology for prototyping, deploying, and evaluating ML inference directly inside programmable switches and NICs. Several practical applications are showcased, including ultra-low-latency stock market prediction and real-time transaction fraud detection. The talk also introduces new data structures and in-network feature-engineering algorithms tailored to resource-constrained network hardware, alongside a hybrid deployment strategy that integrates network devices with end-host servers for further acceleration. Together, these results demonstrate that in-network ML is both feasible and highly effective for time-sensitive tasks, achieving sub-microsecond inference latency and performance competitive with traditional server-based systems.
Speaker: Xinpeng Hong received his PhD in Engineering Science from the University of Oxford. His research interests lie at the intersection of networking, in-network computing, and machine learning, with a focus on time-sensitive applications of in-network machine learning. His work has been published in venues including PACMNET, ACM TODAES, ACM SIGCOMM Computer Communication Review (CCR), IEEE Communications Surveys & Tutorials (COMST), ACM CoNEXT, ECAI, and IEEE HPSR. In 2024, his research received the Prototypes for Humanity Award from the Dubai Future Foundation.
