Improve model card with metadata and links to paper and code

This PR enhances the model card by:
- Adding comprehensive metadata including `license: apache-2.0`, `library_name: transformers`, `pipeline_tag: feature-extraction`, and relevant `tags` (`tokenizer`, `embeddings`, `unicode`, `multilingual`, `modular-lm`). This improves discoverability and enables the "Use in Transformers" widget.
- Adding a prominent link to the paper: [Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate](https://huggingface.co/papers/2507.07129).
- Adding a direct link to the GitHub repository for the code: [https://github.com/Bochkov/BVV241-Tokenizer-Benchmarking-and-Frozen-Embedding-Sets](https://github.com/Bochkov/BVV241-Tokenizer-Benchmarking-and-Frozen-Embedding-Sets).
- Minor improvement to the citation block formatting.

Files changed (1) hide show

README.md +16 -5

README.md CHANGED Viewed

@@ -1,10 +1,21 @@
 ---
-# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
-# Doc / guide: https://huggingface.co/docs/hub/model-cards
-{}
 ---
 # bvv241-nemo: SOTA Mistral Nemo Tokenizer with BVV-mapped Frozen Embedding
 ## Tokenizer Description
 <!-- Provide a longer summary of what this model is. -->
@@ -50,7 +61,7 @@ embeddings = torch.load(emb_path)
 If you use this model or the underlying concepts in your research, please cite our work:
-```
 @misc{bochkov2025emergentsemanticstokenembeddings,
       title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
       author={A. Bochkov},
@@ -72,4 +83,4 @@ If you use this model or the underlying concepts in your research, please cite o
 }
 ```
-This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.

 ---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: feature-extraction
+tags:
+  - tokenizer
+  - embeddings
+  - unicode
+  - multilingual
+  - modular-lm
 ---
 # bvv241-nemo: SOTA Mistral Nemo Tokenizer with BVV-mapped Frozen Embedding
+This model contains the tokenizer and frozen embeddings described in the paper [Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate](https://huggingface.co/papers/2507.07129).
+Code: [https://github.com/Bochkov/BVV241-Tokenizer-Benchmarking-and-Frozen-Embedding-Sets](https://github.com/Bochkov/BVV241-Tokenizer-Benchmarking-and-Frozen-Embedding-Sets)
 ## Tokenizer Description
 <!-- Provide a longer summary of what this model is. -->
 If you use this model or the underlying concepts in your research, please cite our work:
+```bibtex
 @misc{bochkov2025emergentsemanticstokenembeddings,
       title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
       author={A. Bochkov},
 }
 ```
+This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.