Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents Paper • 2509.06501 • Published Sep 8, 2025 • 79
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published Feb 12, 2025 • 58
Group-in-Group Policy Optimization for LLM Agent Training Paper • 2505.10978 • Published May 16, 2025 • 18
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16, 2025 • 273
MiniMax-M1 Collection MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. • 6 items • Updated Oct 21, 2025 • 119
Is Extending Modality The Right Path Towards Omni-Modality? Paper • 2506.01872 • Published Jun 2, 2025 • 23
Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents Paper • 2505.02156 • Published May 4, 2025 • 18
TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence Paper • 2505.24500 • Published May 30, 2025 • 12
SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals Paper • 2406.04784 • Published Jun 7, 2024 • 2
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics Paper • 2506.00070 • Published May 29, 2025 • 29
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning Paper • 2506.03136 • Published Jun 3, 2025 • 25
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents Paper • 2506.03143 • Published Jun 3, 2025 • 53
ARIA: Training Language Agents with Intention-Driven Reward Aggregation Paper • 2506.00539 • Published May 31, 2025 • 30
ATLAS: Learning to Optimally Memorize the Context at Test Time Paper • 2505.23735 • Published May 29, 2025 • 22
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles Paper • 2505.19914 • Published May 26, 2025 • 45