Overview
AutoNeural is a next-generation, NPU-native multimodal vision–language model co-designed from the ground up for real-time, on-device inference. Instead of adapting GPU-first architectures, AutoNeural redesigns both vision encoding and language modeling for the constraints and capabilities of NPUs—achieving 14× faster latency, 7× lower quantization error, and real-time automotive performance even under aggressive low-precision settings.
AutoNeural integrates:
- A MobileNetV5-based vision encoder with depthwise separable convolutions.
- A Liquid AI hybrid Transformer-SSM language backbone that dramatically reduces KV-cache overhead.
- A normalization-free MLP connector tailored for quantization stability.
- Mixed-precision W8A16 (vision) and W4A16 (language) inference validated on real Qualcomm NPUs.
AutoNeural powers real-time cockpit intelligence including in-cabin safety, out-of-cabin awareness, HMI understanding, and visual + conversational function calls, as demonstrated in the on-device results (Page 6 figure) .
Key Features
🔍 MobileNetV5 Vision Encoder (300M)
Optimized for edge hardware, with:
- Depthwise separable convolutions for low compute and bounded activations.
- Local attention bottlenecks only in late stages for efficient long-range reasoning.
- Multi-Scale Fusion Adapter (MSFA) producing a compact 16×16×2048 feature map.
- Stable INT8/16 behavior with minimal post-quantization degradation.
Yields 5.8× – 14× speedups over ViT baselines across 256–768 px inputs.
🧠 Hybrid Transformer-SSM Language Backbone (1.2B)
Designed for NPU memory hierarchies:
- 5:1 ratio of SSM layers to Transformer attention layers
- Linear-time gated convolution layers for most steps
- Tiny rolling state instead of KV-cache → up to 60% lower memory bandwidth
- W4A16 stable quantization across layers
🔗 Normalization-Free Vision–Language Connector
A compact 2-layer MLP using SiLU, deliberately removing RMSNorm to avoid unstable activation ranges during static quantization.
Ensures reliable deployment on W8A16/W4A16 pipelines.
🚗 Automotive-Grade Multimodal Intelligence
Trained on 10M Infinity-MM samples plus 200k automotive cockpit samples, covering:
- AI Sentinel (vehicle security)
- AI Greeter (identity recognition)
- Car Finder (parking localization)
- Passenger safety monitoring
Ensures robust performance across lighting, demographics, weather, and motion scenarios.
⚡ Real NPU Benchmarks
Validated on Qualcomm SA8295P NPU:
| Metric | Baseline (InternVL 2B) | AutoNeural-VL |
|---|---|---|
| TTFT | ~1.4 s | ~100 ms |
| Max Vision Resolution | 448×448 | 768×768 |
| RMS Quant Error | 3.98% | 0.56% |
| Decode Throughput | 15 tok/s | 44 tok/s |
| Context Length | 1024 | 4096 |
How to Use
⚠️ Hardware requirement: AutoNeural is optimized for Qualcomm NPUs.
1) Install Nexa-SDK
Download the SDK,follow the installation steps provided on the model page.
2) Configure authentication
Create an access token in the Model Hub, then run:
nexa config set license '<access_token>'
3) Run the model
nexa infer NexaAI/AutoNeural
Image input
Drag and drop one or more image files into the terminal window. Multiple images can be processed with a single query.
License
The AutoNeural model is released under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license.
You may:
- Use the model for non-commercial purposes
- Modify and redistribute it with attribution
For commercial licensing, please contact: [email protected]