Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published 5 days ago • 38
WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance Paper • 2511.12997 • Published Nov 17, 2025 • 11
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents Paper • 2504.13203 • Published Apr 15, 2025 • 35
MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations Paper • 2504.07830 • Published Apr 10, 2025 • 18
$\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization Paper • 2410.04717 • Published Oct 7, 2024 • 18
SciCode: A Research Coding Benchmark Curated by Scientists Paper • 2407.13168 • Published Jul 18, 2024 • 17
Instruction Diversity Drives Generalization To Unseen Tasks Paper • 2402.10891 • Published Feb 16, 2024
PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis Paper • 2309.05833 • Published Sep 11, 2023
PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models Paper • 2406.06887 • Published Jun 11, 2024 • 2
Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion Paper • 2401.12947 • Published Jan 23, 2024 • 4