Configuration Parsing Warning: Invalid JSON for config file config.json

Configuration Parsing Warning: Invalid JSON for config file tokenizer_config.json

AIRPHA-VLM-7B

Model Overview

AIRPHA-VLM-7B is a 7-billion parameter vision-language model specialized for aerial scene understanding from drone and UAV perspectives. The model excels at:

Aerial Visual Question Answering (VQA): Understanding and answering questions about aerial scenes
Aerial Image Captioning: Generating detailed descriptions of drone/aerial imagery
Infrastructure Defect Detection: Identifying and describing structural defects from aerial inspections

Model Details

Developed by: AquaAge Inc.
Model type: Vision-Language Model (VLM)
Base Model: Qwen-VL-7B
Language(s): English, Japanese
Parameters: 7B
License: Apache 2.0
Version: v0.1 (Beta)

Capabilities

1. Aerial Visual Question Answering

Q: "What infrastructure is visible in this aerial image?"
A: "The image shows a highway overpass with multiple lanes..."

2. Aerial Scene Captioning

Input: [Aerial drone image]
Output: "An aerial view of an urban intersection with surrounding buildings..."

3. Defect Detection

Input: [Infrastructure inspection image]
Output: "Visible cracks on the bridge surface, approximately 2 meters long..."

Quick Start

from transformers import AutoModel, AutoTokenizer
from PIL import Image

# Load model
model = AutoModel.from_pretrained("AquaAge/airpha-VLM-7B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("AquaAge/airpha-VLM-7B", trust_remote_code=True)

# Load image
image = Image.open("aerial_image.jpg")

# Inference
prompt = "Describe what you see in this aerial image."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, images=image)
response = tokenizer.decode(outputs[0])
print(response)

Performance

Task	Metric	Score
Aerial VQA	Accuracy	TBD
Captioning	CIDEr	TBD
Defect Detection	F1	TBD

Limitations

Beta version (v0.1) - performance may vary
Optimized for aerial/drone perspectives
May require fine-tuning for specific use cases

Use Cases

🚁 Drone-based infrastructure inspection
🏗️ Construction site monitoring
🌆 Urban planning and analysis
⚠️ Disaster assessment
🛣️ Transportation infrastructure evaluation

Citation

@misc{airpha-vlm-7b-2026,
  author = {AquaAge Inc.},
  title = {AIRPHA-VLM-7B: Vision-Language Model for Aerial Scene Understanding},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AquaAge/airpha-VLM-7B}}
}

Contact

Organization: AquaAge Inc.
Issues: Please report issues on our GitHub

License

This model is released under the Apache 2.0 License.

Downloads last month: 11

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support