A collection of mini-swe-agent-plus and corresponding rollout traces that drive Qwen3-8B to a 39% solve rate on SWE-bench Verified. Enjoy!
AI & ML interests
LLM foundation&research @ Kuaishou Technology
Recent Activity
View all activity
Effective supervised fine-tuning (SFT) with synthetic data followed by multi-turn reinforcement learning (RL) for boosting agentic models.
KlearReasoner
-
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 16 -
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
Paper • 2509.20712 • Published • 19 -
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
Paper • 2508.07629 • Published • 43 -
Kwai-Klear/Klear-Reasoner-8B
8B • Updated • 38 • 19
A collection of mini-swe-agent-plus and corresponding rollout traces that drive Qwen3-8B to a 39% solve rate on SWE-bench Verified. Enjoy!
Effective supervised fine-tuning (SFT) with synthetic data followed by multi-turn reinforcement learning (RL) for boosting agentic models.
Klear1.0
KlearReasoner
-
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 16 -
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
Paper • 2509.20712 • Published • 19 -
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
Paper • 2508.07629 • Published • 43 -
Kwai-Klear/Klear-Reasoner-8B
8B • Updated • 38 • 19
RL with Experience rePlay