Skip to content

Hands-on Workshop: RAG pipeline With Data Prep Kit + Milvus + Llama

Photo of Arivoli Tirouvingadame
Hosted By
Arivoli T.
Hands-on Workshop: RAG pipeline With Data Prep Kit + Milvus + Llama

Details

ABSTRACT:
RAG (Retrieval-Augmented Generation) or fine-tuning a model, a significant portion of your time will be dedicated to data wrangling (cleaning, de-duping, removing markups, etc.). Data Prep Kit (https://github.com/IBM/data-prep-kit) can help you with data wrangling.

Noteworthy features of DPK include: de-duping documents (exact dedupe and fuzzy dedupe), handling documents and code, language detection (spoken languages and programming languages), malware detection and creating embeddings.

In this hands-on workshop, we will demonstrate implementing an end-to-end RAG pipeline using all opensource technologies.

  1. Data Prep Kit for processing documents
  2. Milvus as vector database
  3. Llama 3 as the LLM

What do you need to participate in this workshop?

  1. A laptop with Python development environment (Setup Instruction)
  2. A Replicate account (FREE) - get one at replicate.com

INSTRUCTOR:

  1. Sujee Maniyam, Consulting AI Engineer / Developer Advocate

Sujee Maniyam is a seasoned practitioner focusing on Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He loves teaching and has taught and mentored thousands of professionals.
Sujee is a passionate user, advocate and contributor to open source. He is also an author and enjoys speaking at conferences and running hackathons and workshops, engaging with the community.

  1. Jiang Chen is the Head of Ecosystem and Developer Relations at Zilliz, the company behind the open-source vector database Milvus. Before joining Zilliz, he had previously served as a tech lead and product manager at Google, where he led the development of web-scale semantic understanding and search indexing that powers innovative search products such as short video search. He has years of industry experience handling massive unstructured data and multi-modal content retrieval. Jiang holds a Master's degree in Computer Science from the University of Michigan.

In his talk, Jiang will give hands-on advice on building RAG applications with the open-source Milvus database and share best practices in search quality and performance optimization.

SPONSOR:
The AI Alliance

Photo of Data Riders group
Data Riders
See more events
Hacker Dojo
855 Maude Ave · Mountain View, CA