Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing Paper β’ 2601.04575 β’ Published Jan 8 β’ 10
OpenEnv Environment Hub Collection All OpenEnv-tagged environments on Hugging Face Hub β’ 173 items β’ Updated 23 days ago β’ 3
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper β’ 2601.05242 β’ Published Jan 8 β’ 229