DynaGuard: A Dynamic Guardrail Model With User-Defined Policies Paper • 2509.02563 • Published Sep 2 • 20
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published May 26 • 92
Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis Paper • 2502.20383 • Published Feb 27 • 3