---
language:
  - en
license: mit
library_name: transformers
pipeline_tag: other
tags:
  - robotics
  - navigation
  - embodied-ai
  - waypoint-prediction
  - qwen
model_name: OpenTrackVLA Qwen0.6B Planner
---

# OpenTrackVLA 🤖 👀

**Visual Navigation & Following for Everyone.**

[](https://opensource.org/licenses/Apache-2.0) [](https://www.google.com/search?q=) [](https://www.google.com/search?q=) [](https://arxiv.org/abs/2509.12129)

**OpenTrackVLA** is a fully open-source Vision-Language-Action (VLA) stack that turns **monocular video** and **natural-language instructions** into actionable, short-horizon waypoints.

While we explore massive backbones (8B/30B) internally, this repository is dedicated to democratizing embodied AI. We have intentionally released our highly efficient **0.6B checkpoint** along with the **full training pipeline**.

### 🚀 Why OpenTrackVLA?

  * **Fully Open Source:** We release the model weights, inference code, *and* the training stack—not just the inference wrapper.
  * **Accessible:** Designed to reproduce, fine-tune, and deploy with affordable compute .
  * **Multimodal Control:** Combines learned priors with visual input to guide real or simulated robots via simple text prompts.

> **Acknowledgment:** OpenTrackVLA builds on the ideas introduced by the original [TrackVLA project](https://github.com/wsakobe/TrackVLA). Their partially-open release inspired this community-driven effort to keep the ecosystem open so researchers and developers can continue improving the stack together.


## Demo In Action

The system processes video history and text instructions to predict future waypoints. Below are examples of the tracker in action:
<div align="center">
<img src="ex1.gif" width="45%" alt="Tracked clip 1" />
<img src="ex2.gif" width="45%" alt="Tracked clip 2" />
</div>

This directory contains the HuggingFace-friendly export of the **OpenTrackVLA** planner.  
Full project (code, datasets, training pipeline): https://github.com/om-ai-lab/OpenTrackVLA

---

## Downloading from HuggingFace

### Python
```python
from transformers import AutoModel

model = AutoModel.from_pretrained("omlab/opentrackvla-qwen06b").eval()
```

## Habitat evaluation using this export

[OpenTrackVLA GitHub Repository](https://github.com/om-ai-lab/OpenTrackVLA)  
[Full Project Documentation](https://github.com/om-ai-lab/OpenTrackVLA#readme)

`trained_agent.py` prefers HuggingFace weights when either env var is set:

- `HF_MODEL_DIR=/abs/path/to/open_trackvla_hf` (already downloaded)
- `HF_MODEL_ID=omlab/opentrackvla-qwen06b` (auto-download via `huggingface_hub`)

Example:
```bash
HF_MODEL_ID=omlab/opentrackvla-qwen06b bash eval.sh
```