Hands-on Workshop: RAG pipeline With Data Prep Kit + Milvus + Llama


Details
ABSTRACT:
RAG (Retrieval-Augmented Generation) or fine-tuning a model, a significant portion of your time will be dedicated to data wrangling (cleaning, de-duping, removing markups, etc.). Data Prep Kit (https://github.com/IBM/data-prep-kit) can help you with data wrangling.
Noteworthy features of DPK include: de-duping documents (exact dedupe and fuzzy dedupe), handling documents and code, language detection (spoken languages and programming languages), malware detection and creating embeddings.
In this hands-on workshop, we will demonstrate implementing an end-to-end RAG pipeline using all opensource technologies.
- Data Prep Kit for processing documents
- Milvus as vector database
- Llama 3 as the LLM
What do you need to participate in this workshop?
- A laptop with Python development environment (Setup Instruction)
- A Replicate account (FREE) - get one at replicate.com
INSTRUCTOR:
- Sujee Maniyam, Consulting AI Engineer / Developer Advocate
Sujee Maniyam is a seasoned practitioner focusing on Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He loves teaching and has taught and mentored thousands of professionals.
Sujee is a passionate user, advocate and contributor to open source. He is also an author and enjoys speaking at conferences and running hackathons and workshops, engaging with the community.
- Contact: sujee@sujee.net
- Portfolio: https://sujee.github.io/portfolio/
- Linkedin : www.linkedin.com/in/sujeemaniyam
- Github : www.github.com/sujee
- Jiang Chen is the Head of Ecosystem and Developer Relations at Zilliz, the company behind the open-source vector database Milvus. Before joining Zilliz, he had previously served as a tech lead and product manager at Google, where he led the development of web-scale semantic understanding and search indexing that powers innovative search products such as short video search. He has years of industry experience handling massive unstructured data and multi-modal content retrieval. Jiang holds a Master's degree in Computer Science from the University of Michigan.
In his talk, Jiang will give hands-on advice on building RAG applications with the open-source Milvus database and share best practices in search quality and performance optimization.
SPONSOR:
The AI Alliance

Hands-on Workshop: RAG pipeline With Data Prep Kit + Milvus + Llama