Inside Reinforcement Learning for LLM Fine-Tuning
Details
Pre-training builds general language understanding, but it does not teach models how to behave in line with human expectations. RLHF and post-training refine model behavior to be helpful, safe, and aligned with real-world usage.
You will learn:
- What reward models are and their role in aligning AI with human preferences
- The RLHF workflow and how policy optimization works
- Practical examples of how RLHF changes model behaviour
- A high-level view of RLHF tools and infrastructure
- Future directions including Agentic AI and preference-driven learning
- How together, RLHF, reward modelling, and post-training transform a capable language model into a trustworthy and user-aligned AI system.
Target Audience: This session is for engineers who want to understand the internals of LLM post-training, including supervised fine-tuning and reinforcement learning. It’s ideal for software, ML, and data engineers curious about how models are aligned with human preferences.
About Speaker : Sumeet Agrawal - With over 14 years of experience, software engineer at Avalara. Interests span AI, software development, information retrieval, and data engineering, and I enjoy applying these disciplines to create data-driven and intelligent platforms.
LinkedIn Profile : https://www.linkedin.com/in/sumeet-agrawal1/
Note : The speakers' views are their own and do not represent Avalara. This session is intended for educational purposes only and does not constitute specific advice or endorsement from Avalara or its organiser ! Additionally This meet-up event does not include any Avalara specific contents.
