Deploying & Scaling LLM in the Enterprise: Architecting Multi-agent AI Systems


Details
Deploying and Scaling Large Language Models in the Enterprise: Architecting Multi-Agent AI Systems Integrating Vision, Data, and Responsible AI
LOCATION ADDRESS (update - virtual)
If you want to join remotely, you can submit questions via Zoom Q&A. The zoom link:
https://acm-org.zoom.us/j/97422303746?pwd=XGkOzZpT1w2Y6OMfxqw2s1IQYov1Dh.1
Large Language Models (LLMs) are rapidly reshaping enterprise AI, but real-world deployments demand far more than fine-tuning and API calls. They require sophisticated architectures capable of scaling inference, integrating multi-modal data streams, and enforcing responsible AI practices—all under the constraints of enterprise SLAs and cost considerations.
In this session, I’ll deliver a deep technical dive into architecting multi-agent AI systems that combine LLMs with computer vision and structured data pipelines. We’ll explore:
Multi-Agent System Design: Architectural patterns for decomposing enterprise workflows into specialized LLM-driven agents, including communication protocols, context sharing, and state management.
Vision-Language Integration: Engineering methods to fuse embeddings from computer vision models with LLM token streams for tasks such as visual question answering, document understanding, and real-time decision support.
Optimization for GPU Inference: Detailed strategies for memory optimization, quantization, mixed-precision computation, and batching to achieve high throughput and low latency in LLM deployment on modern GPU hardware (e.g., NVIDIA A100/H100).
Observability and Responsible AI: Techniques for building observability layers into LLM pipelines—capturing token-level traces, detecting drift, logging model confidence—and implementing fairness audits and risk mitigation protocols at runtime.
Drawing on practical examples from large-scale enterprise deployments across retail, healthcare, and finance, I’ll discuss the engineering trade-offs, tooling stacks, and lessons learned in translating research-grade LLMs into production-grade systems.
This talk is designed for AI engineers and researchers eager to understand the technical complexities—and solutions—behind scaling multi-modal, responsible AI systems that deliver real business value.
Speaker Bio:
Dhanashree is a Senior Machine Learning Engineer and AI Researcher with over a decade of experience designing and deploying advanced AI systems at scale. Her expertise spans architecting multi-agent solutions that integrate Large Language Models (LLMs), computer vision pipelines, and structured data to solve complex enterprise challenges across industries including retail, healthcare, and finance.
At Albertsons, Deloitte, and Fractal, Dhanashree has led the development of production-grade AI applications, focusing on optimization, model observability, and responsible AI practices. Her work includes designing scalable inference architectures for LLMs on modern GPU infrastructures, building hybrid pipelines that fuse vision and language models, and engineering systems that balance performance with ethical and regulatory considerations.
She actively collaborates with research institutions like the University of Illinois. Dhanashree actively engages with the research community and frequently speaks on bridging advanced AI research and production systems.
https://www.linkedin.com/in/dhanashreelele/
Large Language Models (LLMs) are rapidly reshaping enterprise AI, but real-world deployments demand far more than fine-tuning and API calls. They require sophisticated architectures capable of scaling inference, integrating multi-modal data streams, and enforcing responsible AI practices—all under the constraints of enterprise SLAs and cost considerations.
In this session, I’ll deliver a deep technical dive into architecting multi-agent AI systems that combine LLMs with computer vision and structured data pipelines. We’ll explore:
- Multi-Agent System Design: Architectural patterns for decomposing enterprise workflows into specialized LLM-driven agents, including communication protocols, context sharing, and state management.
- Vision-Language Integration: Engineering methods to fuse embeddings from computer vision models with LLM token streams for tasks such as visual question answering, document understanding, and real-time decision support.
- Optimization for GPU Inference: Detailed strategies for memory optimization, quantization, mixed-precision computation, and batching to achieve high throughput and low latency in LLM deployment on modern GPU hardware (e.g., NVIDIA A100/H100).
- Observability and Responsible AI: Techniques for building observability layers into LLM pipelines—capturing token-level traces, detecting drift, logging model confidence—and implementing fairness audits and risk mitigation protocols at runtime.
Drawing on practical examples from large-scale enterprise deployments across retail, healthcare, and finance, I’ll discuss the engineering trade-offs, tooling stacks, and lessons learned in translating research-grade LLMs into production-grade systems.
This talk is designed for AI engineers and researchers eager to understand the technical complexities—and solutions—behind scaling multi-modal, responsible AI systems that deliver real business value.
Speaker Bio:
Dhanashree is a Senior Machine Learning Engineer and AI Researcher with over a decade of experience designing and deploying advanced AI systems at scale. Her expertise spans architecting multi-agent solutions that integrate Large Language Models (LLMs), computer vision pipelines, and structured data to solve complex enterprise challenges across industries including retail, healthcare, and finance.
At Albertsons, Deloitte, and Fractal, Dhanashree has led the development of production-grade AI applications, focusing on optimization, model observability, and responsible AI practices. Her work includes designing scalable inference architectures for LLMs on modern GPU infrastructures, building hybrid pipelines that fuse vision and language models, and engineering systems that balance performance with ethical and regulatory considerations.
She actively collaborates with research institutions like the University of Illinois. Dhanashree actively engages with the research community and frequently speaks on bridging advanced AI research and production systems.
https://www.linkedin.com/in/dhanashreelele/
Large Language Models (LLMs) are rapidly reshaping enterprise AI, but real-world deployments demand far more than fine-tuning and API calls. They require sophisticated architectures capable of scaling inference, integrating multi-modal data streams, and enforcing responsible AI practices—all under the constraints of enterprise SLAs and cost considerations.
In this session, I’ll deliver a deep technical dive into architecting multi-agent AI systems that combine LLMs with computer vision and structured data pipelines. We’ll explore:
- Multi-Agent System Design: Architectural patterns for decomposing enterprise workflows into specialized LLM-driven agents, including communication protocols, context sharing, and state management.
- Vision-Language Integration: Engineering methods to fuse embeddings from computer vision models with LLM token streams for tasks such as visual question answering, document understanding, and real-time decision support.
- Optimization for GPU Inference: Detailed strategies for memory optimization, quantization, mixed-precision computation, and batching to achieve high throughput and low latency in LLM deployment on modern GPU hardware (e.g., NVIDIA A100/H100).
- Observability and Responsible AI: Techniques for building observability layers into LLM pipelines—capturing token-level traces, detecting drift, logging model confidence—and implementing fairness audits and risk mitigation protocols at runtime.
Drawing on practical examples from large-scale enterprise deployments across retail, healthcare, and finance, I’ll discuss the engineering trade-offs, tooling stacks, and lessons learned in translating research-grade LLMs into production-grade systems.
This talk is designed for AI engineers and researchers eager to understand the technical complexities—and solutions—behind scaling multi-modal, responsible AI systems that deliver real business value.
Speaker Bio:
Dhanashree is a Senior Machine Learning Engineer and AI Researcher with over a decade of experience designing and deploying advanced AI systems at scale. Her expertise spans architecting multi-agent solutions that integrate Large Language Models (LLMs), computer vision pipelines, and structured data to solve complex enterprise challenges across industries including retail, healthcare, and finance.
At Albertsons, Deloitte, and Fractal, Dhanashree has led the development of production-grade AI applications, focusing on optimization, model observability, and responsible AI practices. Her work includes designing scalable inference architectures for LLMs on modern GPU infrastructures, building hybrid pipelines that fuse vision and language models, and engineering systems that balance performance with ethical and regulatory considerations.
She actively collaborates with research institutions like the University of Illinois. Dhanashree actively engages with the research community and frequently speaks on bridging advanced AI research and production systems.
https://www.linkedin.com/in/dhanashreelele/
Join via YouTube:
https://youtube.com/live/
AGENDA
7:00 SFBayACM upcoming events, introduce the speaker
7:15 speaker presentation starts
8:15 - 8:30 finish, depending on Q&A
Join SF Bay ACM Chapter for an insightful discussion on:
Abstract:
Large Language Models (LLMs) are rapidly reshaping enterprise AI, but real-world deployments demand far more than fine-tuning and API calls. They require sophisticated architectures capable of scaling inference, integrating multi-modal data streams, and enforcing responsible AI practices—all under the constraints of enterprise SLAs and cost considerations.
In this session, I’ll deliver a deep technical dive into architecting multi-agent AI systems that combine LLMs with computer vision and structured data pipelines. We’ll explore:
- Multi-Agent System Design: Architectural patterns for decomposing enterprise workflows into specialized LLM-driven agents, including communication protocols, context sharing, and state management.
- Vision-Language Integration: Engineering methods to fuse embeddings from computer vision models with LLM token streams for tasks such as visual question answering, document understanding, and real-time decision support.
- Optimization for GPU Inference: Detailed strategies for memory optimization, quantization, mixed-precision computation, and batching to achieve high throughput and low latency in LLM deployment on modern GPU hardware (e.g., NVIDIA A100/H100).
- Observability and Responsible AI: Techniques for building observability layers into LLM pipelines—capturing token-level traces, detecting drift, logging model confidence—and implementing fairness audits and risk mitigation protocols at runtime.
Drawing on practical examples from large-scale enterprise deployments across retail, healthcare, and finance, I’ll discuss the engineering trade-offs, tooling stacks, and lessons learned in translating research-grade LLMs into production-grade systems.
This talk is designed for AI engineers and researchers eager to understand the technical complexities—and solutions—behind scaling multi-modal, responsible AI systems that deliver real business value.
Speaker Bio:
Dhanashree is a Senior Machine Learning Engineer and AI Researcher with over a decade of experience designing and deploying advanced AI systems at scale. Her expertise spans architecting multi-agent solutions that integrate Large Language Models (LLMs), computer vision pipelines, and structured data to solve complex enterprise challenges across industries including retail, healthcare, and finance.
At Albertsons, Deloitte, and Fractal, Dhanashree has led the development of production-grade AI applications, focusing on optimization, model observability, and responsible AI practices. Her work includes designing scalable inference architectures for LLMs on modern GPU infrastructures, building hybrid pipelines that fuse vision and language models, and engineering systems that balance performance with ethical and regulatory considerations.
She actively collaborates with research institutions like the University of Illinois. Dhanashree actively engages with the research community and frequently speaks on bridging advanced AI research and production systems.

Deploying & Scaling LLM in the Enterprise: Architecting Multi-agent AI Systems