webxos
/

microd_v1

webxos commited on 27 days ago

Commit

e07033f

verified ·

1 Parent(s): 970442c

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -52,8 +52,7 @@ small set of files the user can use to template their own agents. Designed for e
 Use **MICROD V1.0 (micro-distill-grpo-vae)** in your own custom projects and train it from the ground up.
 The model's architecture details further underscore an educational niche: a hidden size of 512, 8 layers, 8 attention heads, a vocabulary of 50,257 tokens,
-and a max sequence length of 1024. It supports KV-cache reuse with a 512 cache size, enabling faster generation for sequential thoughts, though this feature
-is noted as inactive in some interfaces. Licensed under Apache 2.0, it's openly available for modification, and its small footprint allows quantization,
 making it runnable on modest hardware like CPUs or even browsers via TensorFlow.js integration.
 ## Model Details

 Use **MICROD V1.0 (micro-distill-grpo-vae)** in your own custom projects and train it from the ground up.
 The model's architecture details further underscore an educational niche: a hidden size of 512, 8 layers, 8 attention heads, a vocabulary of 50,257 tokens,
+and a max sequence length of 1024. Licensed under Apache 2.0, it's openly available for modification, and its small footprint allows quantization,
 making it runnable on modest hardware like CPUs or even browsers via TensorFlow.js integration.
 ## Model Details