Update README.md
Browse files
README.md
CHANGED
|
@@ -8,9 +8,8 @@ tags:
|
|
| 8 |
- research
|
| 9 |
- pytorch
|
| 10 |
- vlm
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
- lusxvr/nanoVLM-222M
|
| 14 |
---
|
| 15 |
|
| 16 |
**nanoVLM** is a minimal and lightweight Vision-Language Model (VLM) designed for efficient training and experimentation. Built using pure PyTorch, the entire model architecture and training logic fits within ~750 lines of code. It combines a ViT-based image encoder (SigLIP-B/16-224-85M) with a lightweight causal language model (SmolLM2-135M), resulting in a compact 222M parameter model.
|
|
@@ -26,4 +25,7 @@ Follow the install instructions and run the following code:
|
|
| 26 |
from models.vision_language_model import VisionLanguageModel
|
| 27 |
|
| 28 |
model = VisionLanguageModel.from_pretrained("kulia-moon/jasVLM-nanoVLM")
|
| 29 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
- research
|
| 9 |
- pytorch
|
| 10 |
- vlm
|
| 11 |
+
datasets:
|
| 12 |
+
- HuggingFaceM4/the_cauldron
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
**nanoVLM** is a minimal and lightweight Vision-Language Model (VLM) designed for efficient training and experimentation. Built using pure PyTorch, the entire model architecture and training logic fits within ~750 lines of code. It combines a ViT-based image encoder (SigLIP-B/16-224-85M) with a lightweight causal language model (SmolLM2-135M), resulting in a compact 222M parameter model.
|
|
|
|
| 25 |
from models.vision_language_model import VisionLanguageModel
|
| 26 |
|
| 27 |
model = VisionLanguageModel.from_pretrained("kulia-moon/jasVLM-nanoVLM")
|
| 28 |
+
```
|
| 29 |
+
# Evaluation
|
| 30 |
+
|
| 31 |
+

|