Update README.md
Browse files
README.md
CHANGED
|
@@ -6,12 +6,12 @@ tags:
|
|
| 6 |
- causal-lm
|
| 7 |
- frozen-embeddings
|
| 8 |
- conceptual-demo
|
| 9 |
-
- MoE-ready
|
| 10 |
- transformer
|
| 11 |
pipeline_tag: text-generation
|
| 12 |
library_name: transformers
|
| 13 |
---
|
| 14 |
-
|
|
|
|
| 15 |
|
| 16 |
This repository contains the model and associated resources from the papers
|
| 17 |
|
|
@@ -21,28 +21,30 @@ This repository contains the model and associated resources from the papers
|
|
| 21 |
|
| 22 |
[💻 Code](https://github.com/AVBochkov/Embeddings)
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
**Proof-of-concept Transformer LM with frozen, non-semantic token embeddings trained on a small English-Russian corpus.**
|
| 25 |
|
| 26 |
**This model is part of a series of models designed to demonstrate:**
|
| 27 |
- The viability of transformer language models where the embedding layer is precomputed from non-semantic (Unicode/visual) features and entirely _frozen_ during training.
|
| 28 |
- The possibility of modular/federated model fusion (MoE) by combining models with a shared token embedding matrix, without any additional retraining or alignment.
|
| 29 |
|
| 30 |
-
## Model facts
|
| 31 |
-
|
| 32 |
- **Parameters:** 0.5B
|
| 33 |
- **Architecture:** 16-layer transformer, rotary attention, 1024 context, 32 heads.
|
| 34 |
- **Embedding:** Precomputed, _frozen_ visual/Unicode-based.
|
| 35 |
- **Training corpus:** Small-scale, <10B tokens, ~10% SFT-mixed (for metric tracking, not strong performance).
|
| 36 |
- **Languages:** Russian, English.
|
| 37 |
-
- **MoE compatibility:** Embedding space is shared with other `bvv` models (e.g. `Bochkov/
|
| 38 |
|
| 39 |
## Key points
|
| 40 |
This model was trained on a small corpus and is intended only to demonstrate the viability of frozen, visual/Unicode-derived embeddings for training and transfer between languages.
|
| 41 |
|
| 42 |
Performance is not comparable to SOTA but shows competitive compositional skills versus a fully trainable embedding baseline.
|
| 43 |
|
| 44 |
-
For direct benchmarking, see also [Bochkov/
|
| 45 |
-
Enables seamless fusion/MoE with Bochkov/
|
| 46 |
Main evaluation
|
| 47 |
MMLU avg: 22.3% ±0.1
|
| 48 |
ARC-e: 23.0%
|
|
@@ -51,8 +53,32 @@ CommonsenseQA: 20.1%
|
|
| 51 |
SQUAD: 14.8%
|
| 52 |
BLEU [en-ru]: 6.4%
|
| 53 |
BLEU [ru-en]: 8.8%
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
## 🧑🔬 Citation & Concept
|
| 55 |
-
|
|
|
|
|
|
|
| 56 |
```
|
| 57 |
@article{
|
| 58 |
bochkov2025emergent,
|
|
@@ -75,21 +101,3 @@ If you use or build upon this demo, please cite:
|
|
| 75 |
url={https://arxiv.org/abs/2507.07129},
|
| 76 |
}
|
| 77 |
```
|
| 78 |
-
This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.
|
| 79 |
-
## Example Usage
|
| 80 |
-
```python
|
| 81 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 82 |
-
import torch
|
| 83 |
-
model = AutoModelForCausalLM.from_pretrained('Bochkov/best_bvv_ru', trust_remote_code=True).to('cuda')
|
| 84 |
-
tokenizer = AutoTokenizer.from_pretrained('Bochkov/best_bvv_ru')
|
| 85 |
-
inputs = tokenizer("Hello, мир! ", return_tensors="pt").to('cuda')
|
| 86 |
-
outputs = model.generate(
|
| 87 |
-
**inputs,
|
| 88 |
-
max_new_tokens=100,
|
| 89 |
-
temperature=0.8,
|
| 90 |
-
top_k=50,
|
| 91 |
-
top_p=0.95,
|
| 92 |
-
do_sample=True
|
| 93 |
-
)
|
| 94 |
-
print(tokenizer.decode(outputs[0]))
|
| 95 |
-
```
|
|
|
|
| 6 |
- causal-lm
|
| 7 |
- frozen-embeddings
|
| 8 |
- conceptual-demo
|
|
|
|
| 9 |
- transformer
|
| 10 |
pipeline_tag: text-generation
|
| 11 |
library_name: transformers
|
| 12 |
---
|
| 13 |
+
|
| 14 |
+
# demo_bvv_ru
|
| 15 |
|
| 16 |
This repository contains the model and associated resources from the papers
|
| 17 |
|
|
|
|
| 21 |
|
| 22 |
[💻 Code](https://github.com/AVBochkov/Embeddings)
|
| 23 |
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
## Model summary
|
| 27 |
+
|
| 28 |
**Proof-of-concept Transformer LM with frozen, non-semantic token embeddings trained on a small English-Russian corpus.**
|
| 29 |
|
| 30 |
**This model is part of a series of models designed to demonstrate:**
|
| 31 |
- The viability of transformer language models where the embedding layer is precomputed from non-semantic (Unicode/visual) features and entirely _frozen_ during training.
|
| 32 |
- The possibility of modular/federated model fusion (MoE) by combining models with a shared token embedding matrix, without any additional retraining or alignment.
|
| 33 |
|
|
|
|
|
|
|
| 34 |
- **Parameters:** 0.5B
|
| 35 |
- **Architecture:** 16-layer transformer, rotary attention, 1024 context, 32 heads.
|
| 36 |
- **Embedding:** Precomputed, _frozen_ visual/Unicode-based.
|
| 37 |
- **Training corpus:** Small-scale, <10B tokens, ~10% SFT-mixed (for metric tracking, not strong performance).
|
| 38 |
- **Languages:** Russian, English.
|
| 39 |
+
- **MoE compatibility:** Embedding space is shared with other `bvv` models (e.g. `Bochkov/demo_bvv_zh`) enabling seamless MoE or model fusion at output head level.
|
| 40 |
|
| 41 |
## Key points
|
| 42 |
This model was trained on a small corpus and is intended only to demonstrate the viability of frozen, visual/Unicode-derived embeddings for training and transfer between languages.
|
| 43 |
|
| 44 |
Performance is not comparable to SOTA but shows competitive compositional skills versus a fully trainable embedding baseline.
|
| 45 |
|
| 46 |
+
For direct benchmarking, see also [Bochkov/demo_bvv_unfrozen_ru] — an identical architecture and dataset, but with standard trainable token embeddings.
|
| 47 |
+
Enables seamless fusion/MoE with Bochkov/demo_bvv_zh and Bochkov/demo_bvv_moe (merged model) due to shared embedding space.
|
| 48 |
Main evaluation
|
| 49 |
MMLU avg: 22.3% ±0.1
|
| 50 |
ARC-e: 23.0%
|
|
|
|
| 53 |
SQUAD: 14.8%
|
| 54 |
BLEU [en-ru]: 6.4%
|
| 55 |
BLEU [ru-en]: 8.8%
|
| 56 |
+
|
| 57 |
+
This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.
|
| 58 |
+
|
| 59 |
+
## Example Usage
|
| 60 |
+
|
| 61 |
+
```python
|
| 62 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 63 |
+
import torch
|
| 64 |
+
model = AutoModelForCausalLM.from_pretrained('Bochkov/demo_bvv_ru', trust_remote_code=True).to('cuda')
|
| 65 |
+
tokenizer = AutoTokenizer.from_pretrained('Bochkov/demo_bvv_ru')
|
| 66 |
+
inputs = tokenizer("Hello, мир! ", return_tensors="pt").to('cuda')
|
| 67 |
+
outputs = model.generate(
|
| 68 |
+
**inputs,
|
| 69 |
+
max_new_tokens=100,
|
| 70 |
+
temperature=0.8,
|
| 71 |
+
top_k=50,
|
| 72 |
+
top_p=0.95,
|
| 73 |
+
do_sample=True
|
| 74 |
+
)
|
| 75 |
+
print(tokenizer.decode(outputs[0]))
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
## 🧑🔬 Citation & Concept
|
| 79 |
+
|
| 80 |
+
If you find this work helpful or inspiring, please consider citing the associated papers:
|
| 81 |
+
|
| 82 |
```
|
| 83 |
@article{
|
| 84 |
bochkov2025emergent,
|
|
|
|
| 101 |
url={https://arxiv.org/abs/2507.07129},
|
| 102 |
}
|
| 103 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|