DGTRS-CLIP-ViT-L-14

This is the DGTRS-CLIP-ViT-L-14 model. It can be used for a variety of tasks, including zero-shot image classification and text-image retrieval.

This model is compatible with both the transformers and diffusers libraries.

How to use

With transformers

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("BiliSakura/DGTRS-CLIP-ViT-L-14")
processor = CLIPProcessor.from_pretrained("BiliSakura/DGTRS-CLIP-ViT-L-14")

# Your code here to use the model for image-text similarity, zero-shot classification, etc.

With diffusers

This model's text encoder can be used with Stable Diffusion:

# Your code here to use the text encoder with a diffusion model.

Citation

If you use this model in your research, please cite the original paper:

@article{chenDGTRSDDGTRSCLIPDualGranularity2025a,
  title = {{{DGTRSD}} and {{DGTRSCLIP}}: {{A Dual-Granularity Remote Sensing Image}}--{{Text Dataset}} and {{Vision}}--{{Language Foundation Model}} for {{Alignment}}},
  shorttitle = {{{DGTRSD}} and {{DGTRSCLIP}}},
  author = {Chen, Weizhi and Deng, Yupeng and Jin, Wei and Chen, Jingbo and Chen, Jiansheng and Feng, Yuman and Xi, Zhihao and Liu, Diyou and Li, Kai and Meng, Yu},
  year = 2025,
  journal = {IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
  volume = {18},
  pages = {29113--29130},
  issn = {2151-1535},
  doi = {10.1109/JSTARS.2025.3625958},
  urldate = {2025-12-18},
  keywords = {Buildings,Cross modal retrieval,Cross-modal alignment,curriculum learning,dual-granularity,Green buildings,Integrated circuit modeling,remote sensing,Remote sensing,Resource management,Semantics,Sports,Training,vision-language foundation models (VLFM),Visualization}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including BiliSakura/DGTRS-CLIP-ViT-L-14