LLMs in production : Inference, RAG, and video models

Name: LLMs in production : Inference, RAG, and video models
Start: 2025-09-25T18:45:00+08:00
End: 2025-09-25T20:45:00+08:00
Location: Rakuten Event Space (Raffles Place MRT)

Hosted By

Martin A. and Sam W.

LLMs in production : Inference, RAG, and video models

Details

This month is a bit different!

LOCATION : Near Raffles Place
SIGN-UP (including Waitlist) : [https://luma.com/heqr1dt6 ](https://luma.com/heqr1dt6)
NB : NO SIGN-UP VIA MEETUP, NO WAITLIST VIA MEETUP
FOOD : None

*HELP WANTED*
MLSG needs a few volunteers to help with logistics (like checking people into the event). If you're willing to help, and want to give back to the community, please contact Martin : by emailing my first name at reddragon.ai

Talks:

"Efficient LLM Fine-tuning for Semantic Search" - Dongzhe Wang

In this talk, Dongzhe will explore how large language models (LLMs) can be efficiently fine-tuned to power semantic search systemsm, using parameter-efficient fine-tuning techniques. Attendees will learn about how LLMs are being applied to improve search in practice, along with the challenges of balancing quality and latency in real-world applications. Dongzhe is a Principal Research Scientist at Rakuten Asia, and obtained a Ph.D. from Nanyang Technological University before working at Shopee and Zhuiyi.

"Efficient Inference and Serving of LLMs and Large Video-Generative Models" - Jonathan Zhao

Jonathan will explore techniques for efficiently serving LLMs and large video generative models. The session will cover methods to optimize inference performance alongside system-level strategies for scalable deployment, highlighting key differences in serving the two model types in practice. Attendees will gain insights into approaches for improving inference and serving, considering balancing quality with latency and other real-world challenges. Jonathan is a Senior Software Engineer at Rakuten, having previously done AI product development at a startup.

"IPhO Gold using Agentic Gemini" - Martin Andrews

A recent paper showed that Gemini Pro 2.5, when driven in an agentic loop, could achieve Gold medal standard on the International Physics Olympiad theory questions. This comes hot on the heels of Google's internal version getting to Gold on the IMO (Mathematics). Martin will briefly talk about the IPhO, how the agentic system works, and show some of the actual questions (so you can see what's involved).

---

Talks will start at 7:00pm and end at around 8:45pm, at which point people normally come up to the front for a bit of a chat with each other, and the speakers.

As always, we're actively looking for more speakers - both '30 minutes long-form', and lightning talks. For the lightning talks, we welcome folks to come and talk about something cool they've done with keras, PyTorch, JAX and/or Deep Learning for 5-10mins (so, if you have slides, then #max=10). We believe that the key ingredient for the success of a Lightning Talk is simply the cool/interesting factor. It doesn't matter whether you're an expert or an enthusiastic beginner: Given the responses we have had to previous talks, we're sure there are lots of people who would love to hear what you've been playing with. If you're interested in talking, please just introduce yourself to Martin at one of the events.

Events in Singapore, SG Artificial Intelligence Deep Learning

Artificial Intelligence Applications Machine Learning Software Development