Running LLMs using BigDL-LLM on Intel Laptops and GPUs
Details
LOCATION ADDRESS (Hybrid, in person or by zoom, you choose)
Hacker Dojo
855 Maude Ave
Mountain View, CA 94043
(for faster sign in, read Hacker Dojo policies. When you sign up, state "I accept the Hacker Dojo policies".)
If you want to join remotely, you can submit questions via Zoom QnA. The zoom link:
https://acm-org.zoom.us/j/93784936096?pwd=QVZETjVBN0ZuZnNsZ0F2VFdTL3FkUT09
AGENDA
6:30 Door opens, Food
7:00 SFBayACM upcoming events, introduce the speaker
7:10 presentation starts
8:15-8:30 finish, depending on Q&A
ABSTRACT
Due to the ongoing expansion of large language models (LLMs), leading to performance degradation, heightened memory requirements, and increased computational demands, there is a growing urgency for efficient quantization to compress LLMs into a more compact form. (https://stackoverflow.blog/2023/08/23/fitting-ai-models-in-your-pocket-with-quantization/). Additionally, optimization across platforms is pivotal for enhancing the accessibility of LLMs. BigDL-LLM (https://github.com/intel-analytics/BigDL) is designed to make efficient LLM development more accessible all Intel platform users, spanning from CPUs to GPUs, from clients to the cloud.
BigDL-LLM is an open source library designed to run large language models (LLMs) using low-bit optimizations (FP4/INT4/NF4/FP8/INT8) on Intel XPU , for any PyTorch model with very low latency and small memory footprint. BigDL-LLM incorporates a variety of low-bit technologies including llama.cpp, gptq, bitsandbytes, qlora, and more. With bigdl-llm, users can build and run LLM applications for both inference and fine-tuning, using standard PyTorch APIs (e.g., HuggingFace Transformers and LangChain) on Intel platforms. Meanwhile, a wide range of models (such as LLaMA/LLaM2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, Dolly/Dolly-v2, Bloom, StarCoder, Whisper, InternLM, Baichuan, QWen, MOSS, etc.) have already been verified and optimized on bigdl-llm.
The presentation will walk the audience the process of optimizing a Llama 2 model unitizing the BigDL-LLM library, and offers a practical session on deploying a chatbot through Llama2 on an Intel laptop. Subsequently, A detailed walkthrough of the material will be covered as part of a broader workshop on LLM Agents. We invite everyone to join us in exploring this exciting journey with the Intel BigDL-LLM.
SPEAKER BIOs
Jiao (Jennie) Wang is a AI Framework Engineer on the Machine Learning Platform team at Intel working in the area of AI and big data analytics. She is key contributor in developing and optimizing distributed ML/DL framework and provide customer support for end-to-end AI solutions.
Guoqiong Song: An AI Frameworks Engineer at Intel, with a focus on building end-to-end AI applications within the AI Software Engineering department. She has a PhD degree in Atmospheric and Oceanic Sciences from UCLA, specializing in quantitative modeling and data analysis. Previously, Guoqiong worked at Verizon as a data scientist.
https://www.linkedin.com/in/guoqiong-song-903aa759/
Jiao (Jennie) Wang is a AI Framework Engineer on the Machine Learning Platform team at Intel working in the area of AI and big data analytics. She is key contributor in developing and optimizing distributed ML/DL framework and provide customer support for end-to-end AI solutions.