--- title: NAF Zero-Shot Feature Upsampling emoji: 🎯 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 6.0.1 app_file: app.py pinned: false license: apache-2.0 --- # 🎯 NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering This Space demonstrates **NAF (Neighborhood Attention Filtering)**, a method for upsampling features from Vision Foundation Models to any resolution without model-specific training. ## 🚀 Features - **Universal Upsampling**: Works with any Vision Foundation Model (DINOv2, DINOv3, RADIO, DINO, SigLIP, etc.) - **Arbitrary Resolutions**: Upsample features to any target resolution while maintaining aspect ratio - **Zero-Shot**: No model-specific training or fine-tuning required - **Interactive Demo**: Upload your own images or try sample images from various domains ## 🎨 How to Use 1. **Upload an Image**: Click "Upload Your Image" or select from sample images 2. **Choose a Model**: Select a Vision Foundation Model from the dropdown 3. **Set Resolution**: Choose the target resolution for upsampled features (64-512) 4. **Click "Upsample Features"**: See the comparison between low and high-resolution features ## 📊 Visualization The output shows three panels: - **Left**: Your input image - **Center**: Low-resolution features from the backbone (PCA visualization) - **Right**: High-resolution features upsampled by NAF Features are visualized using PCA for the first 3 principal components as RGB channels. ## 🔬 Supported Models - **DINOv3**: Latest self-supervised vision models - **RADIO v2.5**: High-performance vision backbones - **DINOv2**: Self-supervised learning with registers - **DINO**: Original self-supervised ViT - **SigLIP**: Contrastive vision-language models ## 📖 Learn More - **Paper**: [NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering](https://arxiv.org/abs/2501.01535) - **Code**: [GitHub Repository](https://github.com/valeoai/NAF) - **Organization**: [Valeo.ai](https://www.valeo.com/en/valeo-ai/) ## 💡 Use Cases NAF enables better feature representations for: - Dense prediction tasks (segmentation, depth estimation) - High-resolution visual understanding - Feature matching and correspondence - Vision-language alignment ## ⚙️ Technical Details - **Input**: Images up to 512px (maintains aspect ratio) - **Processing**: Backbone feature extraction → NAF upsampling - **Output**: High-resolution features at target resolution - **Device**: Runs on CPU (free tier) or GPU (faster inference) ## 🤝 Citation If you use NAF in your research, please cite: ```bibtex @article{chambon2025naf, title={NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering}, author={Chambon, Lucas and others}, journal={arXiv preprint arXiv:2501.01535}, year={2025} } ``` ## 📜 License This demo is released under the Apache 2.0 license.