Building & Scaling Enterprise RAG Apps – From Code to Multi-Modal RAGs

Name: Building & Scaling Enterprise RAG Apps – From Code to Multi-Modal RAGs
Start: 2025-08-23T14:00:00+05:30
End: 2025-08-23T17:30:00+05:30
Location: iMocha

Hosted By

SUCHETA G D.

Building & Scaling Enterprise RAG Apps – From Code to Multi-Modal RAGs

Details

This next meetup can take our community from conceptual understanding to hands-on enterprise-level implementation of RAGs. Learning Flow:

### 1. Recap and Set the Stage

Brief recap of the core building blocks of RAG (previous session on 2nd Aug)
Explain why building an enterprise-grade RAG is different from a PoC (data quality, latency, scale, security, evaluation, etc.).

***

### 2. Real-World Case Example & Challenges at Each Stage

Pick a realistic enterprise use case (e.g., HR policy assistant, customer support knowledge base, financial document Q&A).
Flow: (Show actual code outputs at each stage)

Data Ingestion:

Challenges: unstructured data, tables, images in PDFs, multilingual text, etc.
Tools: PDF parsers, OCR, data cleansing pipelines.

Chunking:

Challenges: overlapping context, optimal chunk size.
Demo chunking logic and show how different settings impact downstream quality.

Embedding & Vector Stores:

Challenges: embedding quality, model selection, indexing strategies, cost & scalability.
Show vector outputs and discuss semantic drift issues.

Retrieval:

Challenges: precision vs recall, false positives, latency at scale.
Demo top-k retrieval and show how quality changes with k-values.

Generation (LLM):

Challenges: hallucination, instruction-following, answer sourcing.
Show difference between RAG-constrained output vs raw LLM output.

***

### 3. Overcoming Limitations of RAG

Hybrid search: semantic + keyword for rare terms.
Metadata filtering: how to control search space for context (e.g., department-specific queries).
Guardrails: security and content filtering (e.g., access control).
Evaluation: using metrics like Recall@K, user feedback loops (thumbs up/down).

***

### 4. Fine-Tuning RAGs

Prompt engineering vs fine-tuning: when to choose which.
Fine-tuning embedding models for domain-specific terminology.
Fine-tuning LLMs to reduce hallucination.
In-context learning (few-shot) vs adapter-based methods (LoRA, PEFT).
Quick demo or code snippet for embedding fine-tuning using a small domain dataset.

***

### 5. Multi-Modal RAG:

Difference between text-only vs multimodal RAG.
Use cases: technical manuals (images+text), legal contracts (scanned images+tables), customer support (voice+chat+screenshots).
Building blocks:
Image/audio/video embeddings
Unified vector stores
Multi-modal retrieval.
Show 1-2 examples of image+text embedding retrieval (even if not full code).

***

### 6. End-to-End Demo

Live build of OneRAG app (FastAPI + LlamaIndex + OpenAI/Cohere embeddings + Chroma/Faiss vector store).
Show intermediate outputs:
Original files → cleaned chunks → embeddings (vectors) → vector store → retrieved context → final LLM output.
Include basic UI (e.g., Streamlit or simple web UI) so participants can ask queries.

***

### 7. Q&A and Wrap-up

Share code and sample datasets.
Discuss next steps: advanced tuning, multi-modal deep dive, or agentic RAG.???

***

## Outcomes:

Participants will understand the practical challenges and engineering decisions behind RAGs.
They will see a working enterprise-grade RAG app.
They will leave with a roadmap for multi-modal RAGs and fine-tuning.

***

## Deliverables they can see out of our meetup: