Hands-on Workshop: RAG pipeline With Data Prep Kit + Milvus + Llama

Name: Hands-on Workshop: RAG pipeline With Data Prep Kit + Milvus + Llama
Start: 2024-09-21T11:00:00-07:00
End: 2024-09-21T15:00:00-07:00
Location: Hacker Dojo

Hosted by Arivoli T.

Data Riders

Details

ABSTRACT:
RAG (Retrieval-Augmented Generation) or fine-tuning a model, a significant portion of your time will be dedicated to data wrangling (cleaning, de-duping, removing markups, etc.). Data Prep Kit (https://github.com/IBM/data-prep-kit) can help you with data wrangling.

Noteworthy features of DPK include: de-duping documents (exact dedupe and fuzzy dedupe), handling documents and code, language detection (spoken languages and programming languages), malware detection and creating embeddings.

In this hands-on workshop, we will demonstrate implementing an end-to-end RAG pipeline using all opensource technologies.

Data Prep Kit for processing documents
Milvus as vector database
Llama 3 as the LLM

What do you need to participate in this workshop?

A laptop with Python development environment (Setup Instruction)
A Replicate account (FREE) - get one at replicate.com

INSTRUCTOR:

Sujee Maniyam, Consulting AI Engineer / Developer Advocate

Sujee Maniyam is a seasoned practitioner focusing on Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He loves teaching and has taught and mentored thousands of professionals.
Sujee is a passionate user, advocate and contributor to open source. He is also an author and enjoys speaking at conferences and running hackathons and workshops, engaging with the community.

Contact: sujee@sujee.net
Portfolio: https://sujee.github.io/portfolio/
Linkedin : www.linkedin.com/in/sujeemaniyam
Github : www.github.com/sujee

Jiang Chen is the Head of Ecosystem and Developer Relations at Zilliz, the company behind the open-source vector database Milvus. Before joining Zilliz, he had previously served as a tech lead and product manager at Google, where he led the development of web-scale semantic understanding and search indexing that powers innovative search products such as short video search. He has years of industry experience handling massive unstructured data and multi-modal content retrieval. Jiang holds a Master's degree in Computer Science from the University of Michigan.

In his talk, Jiang will give hands-on advice on building RAG applications with the open-source Milvus database and share best practices in search quality and performance optimization.

SPONSOR:
The AI Alliance

Data Riders

Hands-on Workshop: RAG pipeline With Data Prep Kit + Milvus + Llama

Data Riders

Details

Related topics

You may also like