Emergent Semantics — Model_1024_FLOAT (335M)

This repository provides Model_1024_FLOAT (335M) — an ablation model from the paper:

📚 Paper (Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations) -

📚 Paper (Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate) -

This checkpoint is designed to isolate the effect of float-valued / normalized frozen embeddings versus binary frozen embeddings, while keeping the Transformer backbone and training setup the same.

What this ablation is

Model_1024_FLOAT uses a frozen embedding table where:

n_embed = 1024 (embedding dimensionality equals d_model)
Each token embedding is a float vector
The embedding vectors are derived from a random (non-semantic) codebook and then normalized (e.g., L2 normalization) to control scale
The embedding weights are frozen (requires_grad=False) for the entire training run

This model is part of an ablation series that tests whether differences in training dynamics / downstream reasoning come from:

semantic structure in embeddings (hypothesis: not required),
or simply numeric properties like dtype/scale/normalization.

Relation to other models in the collection

Compared to Model_1024_BIT (335M):
- Same backbone (d_model=1024, 16 layers, 32 heads, RoPE, GELU)
- Same embedding dimensionality (n_embed=1024)
- Difference is the embedding representation:
  - 1024_BIT: frozen random binary vectors
  - 1024_FLOAT: frozen random float vectors with normalization
Compared to Model_UNI_GLYPH (335M):
- Same embedding dimensionality and frozen setup
- UNI_GLYPH embeddings come from glyph-rendering + PCA; here embeddings are random and intended to be non-semantic
Compared to Model_unfrozen (335M):
- Same architecture
- Here embeddings are frozen; in the baseline they are trainable

Because n_embed=1024, this model is in the same parameter-count class (~335M) as UNI_GLYPH and the unfrozen baseline.

Model summary

Architecture: decoder-only Transformer (GPT-like)
Hidden size (d_model): 1024
Layers: 16
Heads: 32
Positional encoding: rotary embeddings
Activation: GELU
Vocabulary size: 65,536
Tokenizer: Bochkov/bvv241-2-3 compatible
Input embeddings: frozen, random float, normalized, n_embed=1024
Output head: not tied to the input embeddings (trained separately)

Tokenizer

The intended tokenizer is bvv241-2-3:

https://huggingface.co/Bochkov/bvv241-2-3

You can load the tokenizer either from this model repo (if included) or from the standalone tokenizer repo. The key requirement is exact vocab alignment.

How to use (Transformers)


import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Bochkov/emergent-semantics-model-1024-float-335m")
model = AutoModelForCausalLM.from_pretrained("Bochkov/emergent-semantics-model-1024-float-335m", trust_remote_code=True)

inputs = torch.tensor([tokenizer.encode("Question: What is the capital of Japan?\nAnswer:")], dtype=torch.long, device='cuda')

outputs = model.generate(
    inputs, 
    max_new_tokens=10,
    do_sample=False
)
print(tokenizer.decode(outputs[0].tolist()))

Intended use

Research-only checkpoint intended for:

Studying emergent semantics with a frozen random float codebook
Isolating the impact of normalization / vector scale in frozen embeddings
Comparisons against 1024_BIT and UNI_GLYPH under identical backbone/training conditions

Not intended for production deployment (no safety/instruction tuning).

🧑‍🔬 Citation & Concept

If you use this model or the underlying concepts in your research, please cite our work:

@article{
      bochkov2025emergent,
      title={Emergent Semantics Beyond Token Embeddings: Transformer {LM}s with Frozen Visual Unicode Representations},
      author={Andrey Bochkov},
      journal={Transactions on Machine Learning Research},
      issn={2835-8856},
      year={2025},
      url={https://openreview.net/forum?id=Odh8IynO1o},
      note={}
}
@misc{bochkov2025growingtransformersmodularcomposition,
      title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate}, 
      author={A. Bochkov},
      year={2025},
      eprint={2507.07129},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.07129}, 
}

Downloads last month: 5

Collection including Bochkov/emergent-semantics-model-1024-float-335m

Emergent Semantics Beyond Token Embeddings

Collection

Paper: 2507.04886 (TMLR, Oct 2025). 'Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations' • 12 items • Updated about 5 hours ago

Bochkov
/

emergent-semantics-model-1024-float-335m