-
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper • 2309.11499 • Published • 59 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 132 -
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
Paper • 2405.08344 • Published • 15
Yiming Wu
weleen
AI & ML interests
Computer Vision
Organizations
aigc
-
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Paper • 2311.10709 • Published • 25 -
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Paper • 2405.12970 • Published • 25 -
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Paper • 2405.11473 • Published • 56 -
stabilityai/stable-diffusion-3-medium
Text-to-Image • Updated • 6.72k • • 4.9k
foundation model
-
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper • 2309.11499 • Published • 59 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 132 -
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
Paper • 2405.08344 • Published • 15
aigc
-
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Paper • 2311.10709 • Published • 25 -
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Paper • 2405.12970 • Published • 25 -
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Paper • 2405.11473 • Published • 56 -
stabilityai/stable-diffusion-3-medium
Text-to-Image • Updated • 6.72k • • 4.9k