Hugging Face Blog · 10 Mar

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

infrastructureresearch

Hugging Face researchers conducted a comprehensive survey of 16 open-source reinforcement learning libraries to identify best practices for asynchronous RL training at scale.

The core problem in synchronous RL training is that data generation through model inference dominates wall-clock time. A single batch of 32K-token rollouts on a 32-billion parameter model can take hours while training GPUs sit completely idle.

The solution that has emerged is to disaggregate inference and training onto separate GPU pools. These pools are connected by a rollout buffer and transfer weights asynchronously, so neither side waits for the other.

The researchers compared libraries across seven axes: orchestration primitives, rollout buffer design, weight synchronization protocols, staleness management, partial rollout handling, LoRA training support, and distributed training backends.

Key findings reveal that Ray dominates orchestration, appearing in 8 of the 16 surveyed distributed computing libraries. NCCL broadcast from NVIDIA has become the default method for transferring model weights between GPU pools.

Staleness management approaches range from simply dropping outdated samples to using advanced importance-sampling correction techniques. LoRA training support remains sparse across the surveyed libraries.

Distributed Mixture-of-Experts support is emerging as a key differentiator for next-generation systems, with implications for critic-free algorithms, process rewards, and multi-agent co-evolution workflows.

Read original → huggingface.co