Large language models (LLMs) offer powerful tools for problem-solving and simplifying complex concepts. They vary in size from hundreds of millions to trillions of parameters, with larger models often requiring distributed computing. However, externally hosted models pose security risks, especially for sensitive data. To mitigate these risks, on-premise hosting is essential. This meeting will demonstrate how vLLM can be used within a high-performance computing (HPC) environment to pool GPU resources across nodes, enabling the hosting of large models and exposing useful API endpoints for things like embedding, chat completion, and reranking. Once HPC maintenance is complete, this setup will allow Moffitt users to access a secure, ChatGPT-like interface without any data leaving the internal network.

Alex Soupir

Moffitt Cancer Center Bio-Data Club

Science & Education

Biology

Life Sciences

BioInformatics

Genomics

Data Science

Programming in R

Computational Biology

Data Science using R

ecure On-Premise Hosting of Large Language Models Using HPC and vLLM

Artificial Intelligence

Online event

Share this event

ecure On-Premise Hosting of Large Language Models Using HPC and vLLM

Details