nielsr HF Staff commited on
Commit
49a17df
·
verified ·
1 Parent(s): d1d3c42

Improve model card with metadata and links to paper and code

Browse files

This PR enhances the model card by:
- Adding comprehensive metadata including `license: apache-2.0`, `library_name: transformers`, `pipeline_tag: feature-extraction`, and relevant `tags` (`tokenizer`, `embeddings`, `unicode`, `multilingual`, `modular-lm`). This improves discoverability and enables the "Use in Transformers" widget.
- Adding a prominent link to the paper: [Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate](https://huggingface.co/papers/2507.07129).
- Adding a direct link to the GitHub repository for the code: [https://github.com/Bochkov/BVV241-Tokenizer-Benchmarking-and-Frozen-Embedding-Sets](https://github.com/Bochkov/BVV241-Tokenizer-Benchmarking-and-Frozen-Embedding-Sets).
- Minor improvement to the citation block formatting.

Files changed (1) hide show
  1. README.md +16 -5
README.md CHANGED
@@ -1,10 +1,21 @@
1
  ---
2
- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
- # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
- {}
 
 
 
 
 
 
5
  ---
 
6
  # bvv241-nemo: SOTA Mistral Nemo Tokenizer with BVV-mapped Frozen Embedding
7
 
 
 
 
 
8
  ## Tokenizer Description
9
 
10
  <!-- Provide a longer summary of what this model is. -->
@@ -50,7 +61,7 @@ embeddings = torch.load(emb_path)
50
 
51
  If you use this model or the underlying concepts in your research, please cite our work:
52
 
53
- ```
54
  @misc{bochkov2025emergentsemanticstokenembeddings,
55
  title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
56
  author={A. Bochkov},
@@ -72,4 +83,4 @@ If you use this model or the underlying concepts in your research, please cite o
72
  }
73
  ```
74
 
75
- This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.
 
1
  ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: feature-extraction
5
+ tags:
6
+ - tokenizer
7
+ - embeddings
8
+ - unicode
9
+ - multilingual
10
+ - modular-lm
11
  ---
12
+
13
  # bvv241-nemo: SOTA Mistral Nemo Tokenizer with BVV-mapped Frozen Embedding
14
 
15
+ This model contains the tokenizer and frozen embeddings described in the paper [Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate](https://huggingface.co/papers/2507.07129).
16
+
17
+ Code: [https://github.com/Bochkov/BVV241-Tokenizer-Benchmarking-and-Frozen-Embedding-Sets](https://github.com/Bochkov/BVV241-Tokenizer-Benchmarking-and-Frozen-Embedding-Sets)
18
+
19
  ## Tokenizer Description
20
 
21
  <!-- Provide a longer summary of what this model is. -->
 
61
 
62
  If you use this model or the underlying concepts in your research, please cite our work:
63
 
64
+ ```bibtex
65
  @misc{bochkov2025emergentsemanticstokenembeddings,
66
  title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
67
  author={A. Bochkov},
 
83
  }
84
  ```
85
 
86
+ This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.