Stop Guessing: How to Measure and Improve LLM OutputsMost people use LLMs by feel: ask a question, read the answer, decide whether it “seems good,” and move on.
That works for casual use. It does not work when you are building software, automating workflows, writing important documents, or relying on AI for anything that needs to be repeatable.
In this talk, we’ll look at how to improve and evaluate the inputs and outputs of LLMs using practical measurement techniques. We’ll cover how prompt changes affect results, how to compare outputs, how to build simple evaluation sets, and how math-based methods like similarity scoring can help you move beyond guesswork.
This will be beginner-friendly, so even if you don't know anything about AI, you should get something out of it. However, this will be a little more technical than our intro talks. You do not need to be an AI researcher, but programmers and technically curious attendees will get a lot out of it.
We’ll cover:
* Why “it looks good” is not enough
* How to improve prompts by changing the input, context, and constraints
* How to compare LLM outputs more systematically
* Basic evaluation techniques for accuracy, consistency, and usefulness
* How embeddings, cosine similarity, and scoring can help evaluate results
* Where automated evaluation works — and where humans still need to stay in the loop
By the end, you’ll have a practical mental model for treating LLMs less like magic and more like systems you can test, measure, and improve.
LOGISTICS AND PARKING:
The talk starts at 7:00 PM. The first half hour is reserved for everyone to get set up and mingle. Free pizza and drinks!
The cheapest parking option is to find street parking, which will only cost you a few bucks. Otherwise, park in the nearby veteran's museum lot for $8. It's highly recommended you avoid the nearby $15 garage parking.