Configuration Parsing Warning: Invalid JSON for config file config.json
Configuration Parsing Warning: Invalid JSON for config file tokenizer_config.json

AIRPHA-VLM-7B

AIRPHA-VLM-7B

Model Overview

AIRPHA-VLM-7B is a 7-billion parameter vision-language model specialized for aerial scene understanding from drone and UAV perspectives. The model excels at:

  • Aerial Visual Question Answering (VQA): Understanding and answering questions about aerial scenes
  • Aerial Image Captioning: Generating detailed descriptions of drone/aerial imagery
  • Infrastructure Defect Detection: Identifying and describing structural defects from aerial inspections

Model Details

  • Developed by: AquaAge Inc.
  • Model type: Vision-Language Model (VLM)
  • Base Model: Qwen-VL-7B
  • Language(s): English, Japanese
  • Parameters: 7B
  • License: Apache 2.0
  • Version: v0.1 (Beta)

Capabilities

1. Aerial Visual Question Answering

Q: "What infrastructure is visible in this aerial image?"
A: "The image shows a highway overpass with multiple lanes..."

2. Aerial Scene Captioning

Input: [Aerial drone image]
Output: "An aerial view of an urban intersection with surrounding buildings..."

3. Defect Detection

Input: [Infrastructure inspection image]
Output: "Visible cracks on the bridge surface, approximately 2 meters long..."

Quick Start

from transformers import AutoModel, AutoTokenizer
from PIL import Image

# Load model
model = AutoModel.from_pretrained("AquaAge/airpha-VLM-7B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("AquaAge/airpha-VLM-7B", trust_remote_code=True)

# Load image
image = Image.open("aerial_image.jpg")

# Inference
prompt = "Describe what you see in this aerial image."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, images=image)
response = tokenizer.decode(outputs[0])
print(response)

Performance

Task Metric Score
Aerial VQA Accuracy TBD
Captioning CIDEr TBD
Defect Detection F1 TBD

Limitations

  • Beta version (v0.1) - performance may vary
  • Optimized for aerial/drone perspectives
  • May require fine-tuning for specific use cases

Use Cases

  • ๐Ÿš Drone-based infrastructure inspection
  • ๐Ÿ—๏ธ Construction site monitoring
  • ๐ŸŒ† Urban planning and analysis
  • โš ๏ธ Disaster assessment
  • ๐Ÿ›ฃ๏ธ Transportation infrastructure evaluation

Citation

@misc{airpha-vlm-7b-2026,
  author = {AquaAge Inc.},
  title = {AIRPHA-VLM-7B: Vision-Language Model for Aerial Scene Understanding},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AquaAge/airpha-VLM-7B}}
}

Contact

License

This model is released under the Apache 2.0 License.

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support