Building Local LLM Applications with llama.cpp: Modern C++ in Action


Details
Hello all!
Welcome to our final meetup before the summer break. This session will introduce llama.cpp, a lightweight, open-source C++ library designed for efficient large language model (LLM) inference on a wide range of hardware—including CPUs and GPUs from Nvidia, AMD, and Apple. llama.cpp is the engine behind popular projects like Ollama and GPT4All, and it enables developers to run state-of-the-art LLMs locally, even on laptops and resource-constrained devices.
We are kindly hosted by Consat this evening.
Abstract
This talk will take a practical, application-driven approach to leveraging llama.cpp in modern C++ projects. We will cover:
- A high-level introduction to LLMs, focusing on practical applications rather than mathematical theory.
- Where to find open-source LLM models and how to convert them to the GGUF format for use with llama.cpp.
- How to integrate llama.cpp into your C++ projects using #include "llama.h", with live demonstrations of running inference on a laptop.
- An overview of Retrieval Augmented Generation (RAG) systems from an application perspective—what they are, the benefits for custom data, and how to build a simple RAG pipeline with llama.cpp as the inference engine.
Hopefully, by the end of the session, you’ll understand how to use llama.cpp to build efficient, private, and customizable LLM-powered applications in modern C++, and how to set up a RAG system for your own data and business needs.
Explore the library: https://github.com/ggml-org/llama.cpp
AGENDA
18:00 - Doors open, with fika/snacks/food and mingling
18:30 - Event starts, welcome from Consat
18:40 - Info about GbgCpp
18:45 - Presentation starts
19:50 - Wrap-up and closing
20:00 - Event finishes
We look forward to seeing you and diving into hands-on LLM development with modern C++!

Sponsors
Building Local LLM Applications with llama.cpp: Modern C++ in Action