Talk Series: Event Streaming with Mantis & Scaling Computations with Ray


Details
Whether you're a seasoned data engineer or just starting out, this event will provide valuable insights and practical tips on the work we do. Come network and hear from like-minded professionals in the Bay Area who are passionate about all things data engineering!
Title: Scaling Embedding Computations for RAG apps with Ray
Speaker: Cheng Su, Ray Data Lead Engineer @ Anyscale
Abstract: Developing a Retrieval Augmented Generation (RAG) app requires an integration of multiple components, including the LLM, a vector database, and embeddings from very large datasets. The embedding process involves distributed read of numerous documents, chunking and tokenizing them, using an embedding model to compute embeddings (on multiple GPUs), and upserting embeddings into the vector DB.
We will introduce Ray Data, an open-source, distributed ML data processing library, that enables building data pipelines to generate 1 billion embeddings in under a day with a limited budget. We'll cover Ray Data basics, its application to scalable data ingestion for RAG apps, managing heterogeneous clusters (CPU and GPU), throughput optimization, and pipeline design choices.
Title: Mantis: Stream processing for operational data
Speaker: Sundaram Ananthanarayanan, Senior Software Engineer @ Netflix and Harshit Mittal, Senior Data Engineer @ Netflix
Abstract: TBA
---
Data Engineer Things, founded by Xinran Waibel, is a global online data engineering community for data professionals to connect and learn. Join us to grow together!
Slack: http://join.det.life
Medium: https://blog.det.life
Youtube Channel: https://www.youtube.com/@data-engineer-things/streams
Newsletter: https://dataengineerthings.substack.com/
By attending this meetup, you agree to abide by our Code of Conduct. Failure to comply with our CoC may result in removal from current and future DET events.
---
Special thanks to Datastrato and Real-Time Analytics Summit for sponsoring this meetup.

Talk Series: Event Streaming with Mantis & Scaling Computations with Ray