ecure On-Premise Hosting of Large Language Models Using HPC and vLLM

Details
Large language models (LLMs) offer powerful tools for problem-solving and simplifying complex concepts. They vary in size from hundreds of millions to trillions of parameters, with larger models often requiring distributed computing. However, externally hosted models pose security risks, especially for sensitive data. To mitigate these risks, on-premise hosting is essential. This meeting will demonstrate how vLLM can be used within a high-performance computing (HPC) environment to pool GPU resources across nodes, enabling the hosting of large models and exposing useful API endpoints for things like embedding, chat completion, and reranking. Once HPC maintenance is complete, this setup will allow Moffitt users to access a secure, ChatGPT-like interface without any data leaving the internal network.

ecure On-Premise Hosting of Large Language Models Using HPC and vLLM