new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Jan 23

Submitted by

LutherXD

EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

meituan

Submitted by

freesky

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

OpenMOSS-Team

Submitted by

daixuancheng

LLM-in-Sandbox Elicits General Agentic Intelligence

MicrosoftResearch

Microsoft Research

Submitted by

nzl-thu

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Tsinghua-LeapLab

Tsinghua-LeapLab

Submitted by

LiamLian0727

BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries

ZGCA

Zhongguancun Academy

Submitted by

bytetriper

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

nyu-visionx

Submitted by

Facico

Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

ByteDance-Seed

Submitted by

LXT

SAMTok: Representing Any Mask with Two Words

ByteDance

Submitted by

vinid

Learning to Discover at Test Time

StanfordUniversity

Stanford University

Submitted by

taesiri

Qwen3-TTS Technical Report

Qwen

Qwen

Submitted by

taesiri

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

·
85 authors

Submitted by

cihangxie

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

UCSC-VLAA

Submitted by

ZacLiu

Towards Automated Kernel Generation in the Era of LLMs

·
14 authors

Submitted by

songtingyu

Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing

·
9 authors

Submitted by

Remy

ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion

Submitted by

SammyLim

VideoMaMa: Mask-Guided Video Matting via Generative Prior

adobe

Submitted by

taesiri

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

nvidia

Submitted by

Raymond-Qiancx

PROGRESSLM: Towards Progress Reasoning in Vision-Language Models

·
7 authors

Submitted by

Dazitu616

360Anything: Geometry-Free Lifting of Images and Videos to 360°

deepmind

Submitted by

zhangjiaxin2012

Agentic Uncertainty Quantification

Salesforce

2

Submitted by

zhangjiaxin2012

Agentic Confidence Calibration

Salesforce

2

Submitted by

zhangjiaxin2012

From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models

Saleforce

Submitted by

rhachiuma

VIOLA: Towards Video In-Context Learning with Minimal Annotations

·
3 authors

Submitted by

rajkumarrawal

LLM Prompt Evaluation for Educational Applications

VanderbiltUniversity

Vanderbilt University

Submitted by

Cohaerence

Wigner's Friend as a Circuit: Inter-Branch Communication Witness Benchmarks on Superconducting Quantum Hardware

·
1 authors

Submitted by

sandyherho

Numba-Accelerated 2D Diffusion-Limited Aggregation: Implementation and Fractal Characterization

ITB

Institut Teknologi Bandung

Submitted by

ashutosh1919

MirrorBench: An Extensible Framework to Evaluate User-Proxy Agents for Human-Likeness

SAP

SAP