Bochkov commited on
Commit
23c4fd0
·
verified ·
1 Parent(s): 64a9999

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -26
README.md CHANGED
@@ -6,12 +6,12 @@ tags:
6
  - causal-lm
7
  - frozen-embeddings
8
  - conceptual-demo
9
- - MoE-ready
10
  - transformer
11
  pipeline_tag: text-generation
12
  library_name: transformers
13
  ---
14
- # best_bvv_ru
 
15
 
16
  This repository contains the model and associated resources from the papers
17
 
@@ -21,28 +21,30 @@ This repository contains the model and associated resources from the papers
21
 
22
  [💻 Code](https://github.com/AVBochkov/Embeddings)
23
 
 
 
 
 
24
  **Proof-of-concept Transformer LM with frozen, non-semantic token embeddings trained on a small English-Russian corpus.**
25
 
26
  **This model is part of a series of models designed to demonstrate:**
27
  - The viability of transformer language models where the embedding layer is precomputed from non-semantic (Unicode/visual) features and entirely _frozen_ during training.
28
  - The possibility of modular/federated model fusion (MoE) by combining models with a shared token embedding matrix, without any additional retraining or alignment.
29
 
30
- ## Model facts
31
-
32
  - **Parameters:** 0.5B
33
  - **Architecture:** 16-layer transformer, rotary attention, 1024 context, 32 heads.
34
  - **Embedding:** Precomputed, _frozen_ visual/Unicode-based.
35
  - **Training corpus:** Small-scale, <10B tokens, ~10% SFT-mixed (for metric tracking, not strong performance).
36
  - **Languages:** Russian, English.
37
- - **MoE compatibility:** Embedding space is shared with other `bvv` models (e.g. `Bochkov/best_bvv_zh`) enabling seamless MoE or model fusion at output head level.
38
 
39
  ## Key points
40
  This model was trained on a small corpus and is intended only to demonstrate the viability of frozen, visual/Unicode-derived embeddings for training and transfer between languages.
41
 
42
  Performance is not comparable to SOTA but shows competitive compositional skills versus a fully trainable embedding baseline.
43
 
44
- For direct benchmarking, see also [Bochkov/best_bvv_unfrozen_ru] — an identical architecture and dataset, but with standard trainable token embeddings.
45
- Enables seamless fusion/MoE with Bochkov/best_bvv_zh and Bochkov/best_bvv_moe (merged model) due to shared embedding space.
46
  Main evaluation
47
  MMLU avg: 22.3% ±0.1
48
  ARC-e: 23.0%
@@ -51,8 +53,32 @@ CommonsenseQA: 20.1%
51
  SQUAD: 14.8%
52
  BLEU [en-ru]: 6.4%
53
  BLEU [ru-en]: 8.8%
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ## 🧑‍🔬 Citation & Concept
55
- If you use or build upon this demo, please cite:
 
 
56
  ```
57
  @article{
58
  bochkov2025emergent,
@@ -75,21 +101,3 @@ If you use or build upon this demo, please cite:
75
  url={https://arxiv.org/abs/2507.07129},
76
  }
77
  ```
78
- This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.
79
- ## Example Usage
80
- ```python
81
- from transformers import AutoModelForCausalLM, AutoTokenizer
82
- import torch
83
- model = AutoModelForCausalLM.from_pretrained('Bochkov/best_bvv_ru', trust_remote_code=True).to('cuda')
84
- tokenizer = AutoTokenizer.from_pretrained('Bochkov/best_bvv_ru')
85
- inputs = tokenizer("Hello, мир! ", return_tensors="pt").to('cuda')
86
- outputs = model.generate(
87
- **inputs,
88
- max_new_tokens=100,
89
- temperature=0.8,
90
- top_k=50,
91
- top_p=0.95,
92
- do_sample=True
93
- )
94
- print(tokenizer.decode(outputs[0]))
95
- ```
 
6
  - causal-lm
7
  - frozen-embeddings
8
  - conceptual-demo
 
9
  - transformer
10
  pipeline_tag: text-generation
11
  library_name: transformers
12
  ---
13
+
14
+ # demo_bvv_ru
15
 
16
  This repository contains the model and associated resources from the papers
17
 
 
21
 
22
  [💻 Code](https://github.com/AVBochkov/Embeddings)
23
 
24
+ ---
25
+
26
+ ## Model summary
27
+
28
  **Proof-of-concept Transformer LM with frozen, non-semantic token embeddings trained on a small English-Russian corpus.**
29
 
30
  **This model is part of a series of models designed to demonstrate:**
31
  - The viability of transformer language models where the embedding layer is precomputed from non-semantic (Unicode/visual) features and entirely _frozen_ during training.
32
  - The possibility of modular/federated model fusion (MoE) by combining models with a shared token embedding matrix, without any additional retraining or alignment.
33
 
 
 
34
  - **Parameters:** 0.5B
35
  - **Architecture:** 16-layer transformer, rotary attention, 1024 context, 32 heads.
36
  - **Embedding:** Precomputed, _frozen_ visual/Unicode-based.
37
  - **Training corpus:** Small-scale, <10B tokens, ~10% SFT-mixed (for metric tracking, not strong performance).
38
  - **Languages:** Russian, English.
39
+ - **MoE compatibility:** Embedding space is shared with other `bvv` models (e.g. `Bochkov/demo_bvv_zh`) enabling seamless MoE or model fusion at output head level.
40
 
41
  ## Key points
42
  This model was trained on a small corpus and is intended only to demonstrate the viability of frozen, visual/Unicode-derived embeddings for training and transfer between languages.
43
 
44
  Performance is not comparable to SOTA but shows competitive compositional skills versus a fully trainable embedding baseline.
45
 
46
+ For direct benchmarking, see also [Bochkov/demo_bvv_unfrozen_ru] — an identical architecture and dataset, but with standard trainable token embeddings.
47
+ Enables seamless fusion/MoE with Bochkov/demo_bvv_zh and Bochkov/demo_bvv_moe (merged model) due to shared embedding space.
48
  Main evaluation
49
  MMLU avg: 22.3% ±0.1
50
  ARC-e: 23.0%
 
53
  SQUAD: 14.8%
54
  BLEU [en-ru]: 6.4%
55
  BLEU [ru-en]: 8.8%
56
+
57
+ This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.
58
+
59
+ ## Example Usage
60
+
61
+ ```python
62
+ from transformers import AutoModelForCausalLM, AutoTokenizer
63
+ import torch
64
+ model = AutoModelForCausalLM.from_pretrained('Bochkov/demo_bvv_ru', trust_remote_code=True).to('cuda')
65
+ tokenizer = AutoTokenizer.from_pretrained('Bochkov/demo_bvv_ru')
66
+ inputs = tokenizer("Hello, мир! ", return_tensors="pt").to('cuda')
67
+ outputs = model.generate(
68
+ **inputs,
69
+ max_new_tokens=100,
70
+ temperature=0.8,
71
+ top_k=50,
72
+ top_p=0.95,
73
+ do_sample=True
74
+ )
75
+ print(tokenizer.decode(outputs[0]))
76
+ ```
77
+
78
  ## 🧑‍🔬 Citation & Concept
79
+
80
+ If you find this work helpful or inspiring, please consider citing the associated papers:
81
+
82
  ```
83
  @article{
84
  bochkov2025emergent,
 
101
  url={https://arxiv.org/abs/2507.07129},
102
  }
103
  ```