Update README.md
Browse files
README.md
CHANGED
|
@@ -45,14 +45,16 @@ Performance is not comparable to SOTA but shows competitive compositional skills
|
|
| 45 |
|
| 46 |
For direct benchmarking, see also [Bochkov/demo_bvv_unfrozen_ru] — an identical architecture and dataset, but with standard trainable token embeddings.
|
| 47 |
Enables seamless fusion/MoE with Bochkov/demo_bvv_zh and Bochkov/demo_bvv_moe (merged model) due to shared embedding space.
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
|
|
|
|
|
|
| 56 |
|
| 57 |
This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.
|
| 58 |
|
|
|
|
| 45 |
|
| 46 |
For direct benchmarking, see also [Bochkov/demo_bvv_unfrozen_ru] — an identical architecture and dataset, but with standard trainable token embeddings.
|
| 47 |
Enables seamless fusion/MoE with Bochkov/demo_bvv_zh and Bochkov/demo_bvv_moe (merged model) due to shared embedding space.
|
| 48 |
+
|
| 49 |
+
## Key results
|
| 50 |
+
|
| 51 |
+
- **MMLU avg**: 22.3% ±0.1
|
| 52 |
+
- **ARC-e**: 23.0%
|
| 53 |
+
- **ARC-c**: 24.6%
|
| 54 |
+
- **CommonsenseQA**: 20.1%
|
| 55 |
+
- **SQUAD**: 14.8%
|
| 56 |
+
- **BLEU [en-ru]**: 6.4%
|
| 57 |
+
- **BLEU [ru-en]**: 8.8%
|
| 58 |
|
| 59 |
This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.
|
| 60 |
|