Fine-Tuning BERT for the Unstructured Data You Actually Have
Details
Most fine-tuning attention goes to generative LLMs, but a large share of production NLP still runs on BERT-family encoders. They are small, fast, and cheap to serve, and on the tasks where most real data lives (classifying support tickets, extracting fields from documents, routing emails, semantic search) a fine-tuned BERT often matches or beats a prompted frontier model at a fraction of the cost and latency.
In this hands-on workshop, we'll fine-tune an open-weight BERT model on a custom text dataset and deploy it behind a simple UI. Base BERT is small enough that full fine-tuning runs comfortably on a single GPU. The whole pipeline runs on Flyte 2/Union, so data prep is cached, runs are reproducible and recoverable, and the same code scales from a laptop to a cluster without rewrites.
By the end, you'll have a working fine-tuned model and a reusable pipeline you can point at your own unstructured data.
What we'll cover
- Where encoder models like BERT fit, and why they still win on classification, extraction, and embedding tasks
- Fine-tuning an open-weight BERT model with Hugging Face Transformers
- Orchestrating with Flyte 2: cached data prep, GPU-aware training, reproducible runs at any scale
- Deploying behind a UI, with a path to low-latency, scaled inference
What you'll leave with
- A fine-tuned BERT model trained on a custom dataset
- A reusable training and deployment pipeline you can adapt to your own unstructured data
- The knowledge to build and label datasets for classification and extraction tasks
- A portfolio-ready project you can adapt to a production scenario at work
Who it's for
ML engineers and practitioners working with unstructured text who want models that are cheap to run and easy to deploy. Whether you're prototyping at work, evaluating infrastructure for a production NLP use case, or building a portfolio project, you'll leave with code you can keep extending.
Hosted by Sage Elliott, AI Engineer at [Union.ai](https://atunion.ai/?utm_source=luma)
