FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation Paper • 2503.06680 • Published Mar 9, 2025 • 20
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 75
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper • 2410.13824 • Published Oct 17, 2024 • 30
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4, 2024 • 72