Bochkov
/

demo_bvv_ru

Text Generation

feature-extraction

frozen-embeddings

conceptual-demo

Model card Files Files and versions

Bochkov commited on Jan 1

Commit

d51f566

·

verified ·

1 Parent(s): 23c4fd0

Update README.md

Files changed (1) hide show

README.md +10 -8

README.md CHANGED Viewed

@@ -45,14 +45,16 @@ Performance is not comparable to SOTA but shows competitive compositional skills
 For direct benchmarking, see also [Bochkov/demo_bvv_unfrozen_ru] — an identical architecture and dataset, but with standard trainable token embeddings.
 Enables seamless fusion/MoE with Bochkov/demo_bvv_zh and Bochkov/demo_bvv_moe (merged model) due to shared embedding space.
-Main evaluation
-MMLU avg: 22.3% ±0.1
-ARC-e: 23.0%
-ARC-c: 24.6%
-CommonsenseQA: 20.1%
-SQUAD: 14.8%
-BLEU [en-ru]: 6.4%
-BLEU [ru-en]: 8.8%
 This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.

 For direct benchmarking, see also [Bochkov/demo_bvv_unfrozen_ru] — an identical architecture and dataset, but with standard trainable token embeddings.
 Enables seamless fusion/MoE with Bochkov/demo_bvv_zh and Bochkov/demo_bvv_moe (merged model) due to shared embedding space.
+## Key results
+- **MMLU avg**: 22.3% ±0.1
+- **ARC-e**: 23.0%
+- **ARC-c**: 24.6%
+- **CommonsenseQA**: 20.1%
+- **SQUAD**: 14.8%
+- **BLEU [en-ru]**: 6.4%
+- **BLEU [ru-en]**: 8.8%
 This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.