OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation Paper • 2512.08294 • Published 21 days ago • 17
OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation Paper • 2511.20211 • Published Nov 25 • 12
Architecture Decoupling Is Not All You Need For Unified Multimodal Model Paper • 2511.22663 • Published Nov 27 • 29
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published 28 days ago • 32
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 25 days ago • 38
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 25 days ago • 38
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 25 days ago • 38 • 3
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published 28 days ago • 32
OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation Paper • 2511.20211 • Published Nov 25 • 12
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding Paper • 2501.08282 • Published Jan 14
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency Paper • 2506.01908 • Published Jun 2
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation Paper • 2511.16671 • Published Nov 20 • 15
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation Paper • 2511.16671 • Published Nov 20 • 15
Factuality Matters: When Image Generation and Editing Meet Structured Visuals Paper • 2510.05091 • Published Oct 6 • 19