Paper Group: Reinforcement Learning with Action Chunking (Q-chunking)

Name: Paper Group: Reinforcement Learning with Action Chunking (Q-chunking)
Start: 2025-07-28T18:30:00-06:00
End: 2025-07-28T20:30:00-06:00

Hosted By

Logan

Paper Group: Reinforcement Learning with Action Chunking (Q-chunking)

Details

Join us for a paper discussion on "Reinforcement Learning with Action Chunking (Q-chunking)"
Examining offline-to-online RL improvements via temporally extended action sequences
Featured Paper:
"Reinforcement Learning with Action Chunking" (Li, Zhou, Levine, 2025)
arXiv Paper
Discussion Topics:
Q-Chunking Design Principles

RL operates on sequences of actions (chunks), not single-step actions
Both policy and critic predict/evaluate multi-step action chunks
Enables unbiased multi-step temporal-difference (TD) backups and improved value propagation

Behavior Constraints for Exploration

Uses flow-matching-based behavioral policy to capture non-Markovian patterns in offline data
Implicit or explicit behavior constraints ensure temporally coherent exploration
Avoids overly restrictive Gaussian policies for action chunking

Implementation & Algorithms

Simple training: (1) action chunking behavior policy via flow matching, (2) TD-trained critic over action chunks
QC-FQL variant uses Wasserstein distance constraint for policy regularization
Best-of-N sampling picks from multiple candidate action chunks for improved Q-value

Performance Benchmarks

Evaluated on OGBench (scene-sparse, puzzle-3x3-sparse, cube-double/triple/quadruple) and robomimic (lift, can, square) domains
Strong offline performance—matches or exceeds prior methods in pretraining
Major online sample efficiency gains, especially on hard long-horizon tasks (cube-triple/quadruple)
Outperforms 1-step and n-step return baselines and existing offline-to-online RL methods

Implementation Challenges

Hyperparameter tuning for chunk length and critic ensemble size
Ensuring behavior policy can model complex, temporally extended action sequences
Computational cost increases with Best-of-N sampling but yields better exploration

Key Technical Features

Q-function predicts for entire sequence, allowing unbiased multi-step value learning
Temporally coherent actions lead to better state exploration and reward discovery
Action chunking transforms non-Markovian policy challenges into tractable RL objectives

Future Directions

Automating chunk-length selection and chunk boundary detection
Extending chunking to general non-Markovian/fine-grained feedback control settings
Integrating chunked learning into broader RL architectures

---

Silicon Valley Generative AI has two meeting formats:
1. Paper Reading - Every second week we meet to discuss machine learning papers. This is a collaboration between Silicon Valley Generative AI and Boulder Data Science.
2. Talks - Once a month we meet to have someone present on a topic related to generative AI. Speakers can range from industry leaders, researchers, startup founders, subject matter experts and those with an interest in a topic and would like to share. Topics vary from technical to business focused. They can be on how the latest in generative models work and how they can be used, applications and adoption of generative AI, demos of projects and startup pitches or legal and ethical topics. The talks are meant to be inclusive and for a more general audience compared to the paper readings.

If you would like to be a speaker or suggest a paper email us @ svb.ai.paper.suggestions@gmail.com or join our new discord !!!

Events in

Boulder Data Science, Machine Learning & AI

See more events

Boulder Data Science, Machine Learning & AI

Online event

This event has passed

Boulder Data Science, Machine Learning & AI

public group

Paper Group: Reinforcement Learning with Action Chunking (Q-chunking)