--- language: - en license: mit library_name: transformers pipeline_tag: other tags: - robotics - navigation - embodied-ai - waypoint-prediction - qwen model_name: OpenTrackVLA Qwen0.6B Planner --- # OpenTrackVLA 🤖 👀 **Visual Navigation & Following for Everyone.** [](https://opensource.org/licenses/Apache-2.0) [](https://www.google.com/search?q=) [](https://www.google.com/search?q=) [](https://arxiv.org/abs/2509.12129) **OpenTrackVLA** is a fully open-source Vision-Language-Action (VLA) stack that turns **monocular video** and **natural-language instructions** into actionable, short-horizon waypoints. While we explore massive backbones (8B/30B) internally, this repository is dedicated to democratizing embodied AI. We have intentionally released our highly efficient **0.6B checkpoint** along with the **full training pipeline**. ### 🚀 Why OpenTrackVLA? * **Fully Open Source:** We release the model weights, inference code, *and* the training stack—not just the inference wrapper. * **Accessible:** Designed to reproduce, fine-tune, and deploy with affordable compute . * **Multimodal Control:** Combines learned priors with visual input to guide real or simulated robots via simple text prompts. > **Acknowledgment:** OpenTrackVLA builds on the ideas introduced by the original [TrackVLA project](https://github.com/wsakobe/TrackVLA). Their partially-open release inspired this community-driven effort to keep the ecosystem open so researchers and developers can continue improving the stack together. ## Demo In Action The system processes video history and text instructions to predict future waypoints. Below are examples of the tracker in action:
Tracked clip 1 Tracked clip 2
This directory contains the HuggingFace-friendly export of the **OpenTrackVLA** planner. Full project (code, datasets, training pipeline): https://github.com/om-ai-lab/OpenTrackVLA --- ## Downloading from HuggingFace ### Python ```python from transformers import AutoModel model = AutoModel.from_pretrained("omlab/opentrackvla-qwen06b").eval() ``` ## Habitat evaluation using this export [OpenTrackVLA GitHub Repository](https://github.com/om-ai-lab/OpenTrackVLA) [Full Project Documentation](https://github.com/om-ai-lab/OpenTrackVLA#readme) `trained_agent.py` prefers HuggingFace weights when either env var is set: - `HF_MODEL_DIR=/abs/path/to/open_trackvla_hf` (already downloaded) - `HF_MODEL_ID=omlab/opentrackvla-qwen06b` (auto-download via `huggingface_hub`) Example: ```bash HF_MODEL_ID=omlab/opentrackvla-qwen06b bash eval.sh ```