-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 48 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 140 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 7 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266
Collections
Discover the best community collections!
Collections including paper arxiv:2510.03561
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 497 -
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Paper • 2510.07499 • Published • 48 -
Improving Context Fidelity via Native Retrieval-Augmented Reasoning
Paper • 2509.13683 • Published • 8 -
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering
Paper • 2509.00798 • Published • 1
-
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models
Paper • 2510.03561 • Published • 24 -
ReactiveAI/RxT-Alpha-Supervised
Text Generation • 0.2B • Updated -
ReactiveAI/RxT-Alpha-Mini-Supervised
Text Generation • 0.1B • Updated -
ReactiveAI/RxT-Alpha-Micro-Supervised
Text Generation • 28.8M • Updated • 1
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning
Paper • 2502.15425 • Published • 9 -
EgoLife: Towards Egocentric Life Assistant
Paper • 2503.03803 • Published • 46 -
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 85
-
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models
Paper • 2510.03561 • Published • 24 -
ReactiveAI/RxT-Alpha-Supervised
Text Generation • 0.2B • Updated -
ReactiveAI/RxT-Alpha-Mini-Supervised
Text Generation • 0.1B • Updated -
ReactiveAI/RxT-Alpha-Micro-Supervised
Text Generation • 28.8M • Updated • 1
-
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Paper • 2509.13761 • Published • 16 -
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
Paper • 2509.25849 • Published • 47 -
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models
Paper • 2510.03561 • Published • 24 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 497
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 510 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
Lizard: An Efficient Linearization Framework for Large Language Models
Paper • 2507.09025 • Published • 18 -
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
Paper • 2507.23632 • Published • 6 -
Causal Attention with Lookahead Keys
Paper • 2509.07301 • Published • 21
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 48 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 140 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 7 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 497 -
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Paper • 2510.07499 • Published • 48 -
Improving Context Fidelity via Native Retrieval-Augmented Reasoning
Paper • 2509.13683 • Published • 8 -
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering
Paper • 2509.00798 • Published • 1
-
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models
Paper • 2510.03561 • Published • 24 -
ReactiveAI/RxT-Alpha-Supervised
Text Generation • 0.2B • Updated -
ReactiveAI/RxT-Alpha-Mini-Supervised
Text Generation • 0.1B • Updated -
ReactiveAI/RxT-Alpha-Micro-Supervised
Text Generation • 28.8M • Updated • 1
-
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models
Paper • 2510.03561 • Published • 24 -
ReactiveAI/RxT-Alpha-Supervised
Text Generation • 0.2B • Updated -
ReactiveAI/RxT-Alpha-Mini-Supervised
Text Generation • 0.1B • Updated -
ReactiveAI/RxT-Alpha-Micro-Supervised
Text Generation • 28.8M • Updated • 1
-
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Paper • 2509.13761 • Published • 16 -
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
Paper • 2509.25849 • Published • 47 -
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models
Paper • 2510.03561 • Published • 24 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 497
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 510 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning
Paper • 2502.15425 • Published • 9 -
EgoLife: Towards Egocentric Life Assistant
Paper • 2503.03803 • Published • 46 -
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 85
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
Lizard: An Efficient Linearization Framework for Large Language Models
Paper • 2507.09025 • Published • 18 -
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
Paper • 2507.23632 • Published • 6 -
Causal Attention with Lookahead Keys
Paper • 2509.07301 • Published • 21