Paper Group: Reinforcement Learning with Action Chunking (Q-chunking)


Details
Join us for a paper discussion on "Reinforcement Learning with Action Chunking (Q-chunking)"
Examining offline-to-online RL improvements via temporally extended action sequences
Featured Paper:
"Reinforcement Learning with Action Chunking" (Li, Zhou, Levine, 2025)
arXiv Paper
Discussion Topics:
Q-Chunking Design Principles
- RL operates on sequences of actions (chunks), not single-step actions
- Both policy and critic predict/evaluate multi-step action chunks
- Enables unbiased multi-step temporal-difference (TD) backups and improved value propagation
Behavior Constraints for Exploration
- Uses flow-matching-based behavioral policy to capture non-Markovian patterns in offline data
- Implicit or explicit behavior constraints ensure temporally coherent exploration
- Avoids overly restrictive Gaussian policies for action chunking
Implementation & Algorithms
- Simple training: (1) action chunking behavior policy via flow matching, (2) TD-trained critic over action chunks
- QC-FQL variant uses Wasserstein distance constraint for policy regularization
- Best-of-N sampling picks from multiple candidate action chunks for improved Q-value
Performance Benchmarks
- Evaluated on OGBench (scene-sparse, puzzle-3x3-sparse, cube-double/triple/quadruple) and robomimic (lift, can, square) domains
- Strong offline performance—matches or exceeds prior methods in pretraining
- Major online sample efficiency gains, especially on hard long-horizon tasks (cube-triple/quadruple)
- Outperforms 1-step and n-step return baselines and existing offline-to-online RL methods
Implementation Challenges
- Hyperparameter tuning for chunk length and critic ensemble size
- Ensuring behavior policy can model complex, temporally extended action sequences
- Computational cost increases with Best-of-N sampling but yields better exploration
Key Technical Features
- Q-function predicts for entire sequence, allowing unbiased multi-step value learning
- Temporally coherent actions lead to better state exploration and reward discovery
- Action chunking transforms non-Markovian policy challenges into tractable RL objectives
Future Directions
- Automating chunk-length selection and chunk boundary detection
- Extending chunking to general non-Markovian/fine-grained feedback control settings
- Integrating chunked learning into broader RL architectures
---
Silicon Valley Generative AI has two meeting formats:
1. Paper Reading - Every second week we meet to discuss machine learning papers. This is a collaboration between Silicon Valley Generative AI and Boulder Data Science.
2. Talks - Once a month we meet to have someone present on a topic related to generative AI. Speakers can range from industry leaders, researchers, startup founders, subject matter experts and those with an interest in a topic and would like to share. Topics vary from technical to business focused. They can be on how the latest in generative models work and how they can be used, applications and adoption of generative AI, demos of projects and startup pitches or legal and ethical topics. The talks are meant to be inclusive and for a more general audience compared to the paper readings.
If you would like to be a speaker or suggest a paper email us @ svb.ai.paper.suggestions@gmail.com or join our new discord !!!

Paper Group: Reinforcement Learning with Action Chunking (Q-chunking)