File size: 541 Bytes
460910d fe2341d 460910d fe2341d |
1 2 3 4 5 6 7 8 9 10 11 12 |
---
base_model:
- Delta-Vector/Control-Nanuq-8B
library_name: transformers
tags:
- mergekit
- merge
---
This is a GRPO trained version of my Control nanuq model to fuck around with GRPO training. This model is highly experimental - It's **supposed* to do reasoning in XML tags however it doesn't do it for some reason, Possibly i need to train for more epochs
Trained on 1xA100 80gb provided by Lucyknada, Trained with Unsloth, If your trying to replicate the model, One - Don't. Two - Swap out the default L3.1 8B colab with control nanuq |