OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Paper • 2509.01644 • Published Sep 1 • 33
Technical Report on the CleverHans v2.1.0 Adversarial Examples Library Paper • 1610.00768 • Published Oct 3, 2016
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Paper • 2311.16101 • Published Nov 27, 2023 • 1
Compress & Align: Curating Image-Text Data with Human Knowledge Paper • 2312.06726 • Published Dec 11, 2023
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning Paper • 2312.11420 • Published Dec 18, 2023 • 2
SPFormer: Enhancing Vision Transformer with Superpixel Representation Paper • 2401.02931 • Published Jan 5, 2024
Masked Autoencoders Enable Efficient Knowledge Distillers Paper • 2208.12256 • Published Aug 25, 2022
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training Paper • 2211.11446 • Published Nov 21, 2022
Unleashing the Power of Visual Prompting At the Pixel Level Paper • 2212.10556 • Published Dec 20, 2022
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published Sep 23, 2024 • 38
VHELM: A Holistic Evaluation of Vision Language Models Paper • 2410.07112 • Published Oct 9, 2024 • 3
M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation Paper • 2411.10433 • Published Nov 15, 2024
CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions Paper • 2411.16828 • Published Nov 25, 2024 • 1