---
title: NAF Zero-Shot Feature Upsampling
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
license: apache-2.0
---


# 🎯 NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering

This Space demonstrates **NAF (Neighborhood Attention Filtering)**, a method for upsampling features from Vision Foundation Models to any resolution without model-specific training.

## 🚀 Features

- **Universal Upsampling**: Works with any Vision Foundation Model (DINOv2, DINOv3, RADIO, DINO, SigLIP, etc.)
- **Arbitrary Resolutions**: Upsample features to any target resolution while maintaining aspect ratio
- **Zero-Shot**: No model-specific training or fine-tuning required
- **Interactive Demo**: Upload your own images or try sample images from various domains

## 🎨 How to Use

1. **Upload an Image**: Click "Upload Your Image" or select from sample images
2. **Choose a Model**: Select a Vision Foundation Model from the dropdown
3. **Set Resolution**: Choose the target resolution for upsampled features (64-512)
4. **Click "Upsample Features"**: See the comparison between low and high-resolution features

## 📊 Visualization

The output shows three panels:
- **Left**: Your input image
- **Center**: Low-resolution features from the backbone (PCA visualization)
- **Right**: High-resolution features upsampled by NAF

Features are visualized using PCA for the first 3 principal components as RGB channels.

## 🔬 Supported Models

- **DINOv3**: Latest self-supervised vision models
- **RADIO v2.5**: High-performance vision backbones
- **DINOv2**: Self-supervised learning with registers
- **DINO**: Original self-supervised ViT
- **SigLIP**: Contrastive vision-language models

## 📖 Learn More

- **Paper**: [NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering](https://arxiv.org/abs/2501.01535)
- **Code**: [GitHub Repository](https://github.com/valeoai/NAF)
- **Organization**: [Valeo.ai](https://www.valeo.com/en/valeo-ai/)

## 💡 Use Cases

NAF enables better feature representations for:
- Dense prediction tasks (segmentation, depth estimation)
- High-resolution visual understanding
- Feature matching and correspondence
- Vision-language alignment

## ⚙️ Technical Details

- **Input**: Images up to 512px (maintains aspect ratio)
- **Processing**: Backbone feature extraction → NAF upsampling
- **Output**: High-resolution features at target resolution
- **Device**: Runs on CPU (free tier) or GPU (faster inference)

## 🤝 Citation

If you use NAF in your research, please cite:

```bibtex
@article{chambon2025naf,
  title={NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering},
  author={Chambon, Lucas and others},
  journal={arXiv preprint arXiv:2501.01535},
  year={2025}
}
```

## 📜 License

This demo is released under the Apache 2.0 license.