MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research Paper • 2505.19955 • Published May 26, 2025 • 13
ConfTuner: Training Large Language Models to Express Their Confidence Verbally Paper • 2508.18847 • Published Aug 26, 2025 • 2