ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 3 days ago • 138
PhyX: Does Your Model Have the "Wits" for Physical Reasoning? Paper • 2505.15929 • Published May 21, 2025 • 49
Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models Paper • 2508.03332 • Published Aug 5, 2025
LongEmotion: Measuring Emotional Intelligence of Large Language Models in Long-Context Interaction Paper • 2509.07403 • Published Sep 9, 2025 • 35
PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models Paper • 2509.16989 • Published Sep 21, 2025 • 1
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving Paper • 2505.23932 • Published May 29, 2025
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning Paper • 2602.19895 • Published Feb 23 • 14
V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval Paper • 2602.06034 • Published Feb 5 • 8
ATTS: Asynchronous Test-Time Scaling via Conformal Prediction Paper • 2509.15148 • Published Sep 18, 2025 • 1
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving Paper • 2505.23932 • Published May 29, 2025
SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression Paper • 2503.12340 • Published Mar 16, 2025
ATTS: Asynchronous Test-Time Scaling via Conformal Prediction Paper • 2509.15148 • Published Sep 18, 2025 • 1