We will be having two talks this evening - Micheal Zimmerman & Eddy Reyes.
Special thanks to Riley McKenna from Computer Futures for sponsoring food!!
Virtualization AI Workloads (Elastic AI)
We will outline a new virtualization middleware which creates virtual layer for ML/AI frameworks (such as Tensorflow and Pytorch) that operates on top of the new AI servers (such as GPUs, FPGAs and AI ASICs). In this talk you will learn the concept of elastic AI and use of remote attached GPUs, with any virtual machine in the network (whether the VM co-reside in the GPU server or in any other GPU-less compute machine). We will explore the architecture of a distributed AI virtualization layer and how it can operate on top of AI server cluster. We will close with few use cases (GPUaaS, MLaaS, sharing and automating GPU dev/test).
Michael Zimmerman is the CEO of Bitfusion. Before Bitfusion, Michael served as General Manager at Marvell networking where he was responsible for networking and processors portfolio for Enterprise and Data Center segments. Before Marvell, Michael served for 4 years as an executive in private companies considered as industry leaders: Annapurna Labs developing high-performance Smart Network Interface Card (acquired by Amazon Web Services), and before, Tilera – the MIT-based many-core processor company, acquired by Mellanox. Michael completed the Stanford Executive Business Program, hold MS Computer Science (NSU University), MBA and BSEE (Tel Aviv University, Summa Cum Laude)
Observing and Tracing Software at Scale
Eddy Reyes has spent his career making software easier to create and maintain at all levels of the stack. He is the cofounder of Mindsight, which has created the next generation of monitoring and tracing technology to help software teams be more effective at maintaining large scale systems. Before this, Eduardo has built and improved many products as a systems, kernel, and analytics engineer at IBM, Logitech, and various startups.
Microservices and the tools to scale them have unleashed the potential of development teams to move faster than ever before. However, this speed comes at the cost of the added complexity of these solutions. Systems that are harder to understand and harder to observe are harder to fix and improve when they fail to meet customer needs. Data to help observe and understand these systems is more important than ever before. In this talk we will explore how distributed tracing and the next generation of code-level intelligence can help improve the observability of systems and improve the response time of the teams tasked with maintaining the reliability of these systems.