Pythia-31M-seed4 GPT-NeoX Checkpoints
This repository contains the raw GPT-NeoX training checkpoints for Pythia-31M-seed4, part of the PolyPythias suite. These are the native checkpoint files produced during training, stored in DeepSpeed's checkpoint format.
If you want to perform inference, use the HuggingFace Transformers-compatible weights at EleutherAI/pythia-31m-seed4 instead. This repository is intended for research that requires access to optimizer states or the original training format.
Contents
Each branch contains a full training checkpoint at a given step, including:
layer_XX-model_00-model_states.ptβ model weight shards (one per layer)mp_rank_00_model_states.ptβ model state metadatazero_pp_rank_*_optim_states.ptβ ZeRO optimizer states (Adam moments, etc.)31M.ymlβ GPT-NeoX training configuration
Branches
154 checkpoints are available as branches:
step0β initializationstep{1,2,4,8,16,32,64,128,256,512}β log-spaced early checkpointsstep1000throughstep143000β every 1,000 steps
Branch step143000 corresponds to the final model.
Converting to HuggingFace Format
To convert a checkpoint to HuggingFace Transformers format, use the conversion script from GPT-NeoX:
python tools/convert_neox_to_hf.py \
--input_dir /path/to/neox/checkpoint \
--config_file /path/to/config.yml \
--output_dir /path/to/hf/output
Pre-converted weights for all checkpoints are available at EleutherAI/pythia-31m-seed4.
Training Details
Trained on the Pile using a pre-shuffled data ordering specific to this seed. The shuffled index files are available at EleutherAI/pile-preshuffled-seeds.
All PolyPythias models were trained for 143,000 steps with a batch size of 2M tokens (2,097,152 tokens per step), seeing a total of 299,892,736,000 tokens. See the PolyPythias paper and Pythia GitHub repository for full training details.
| Model Size | Parameters | Layers | Model Dim | Heads | Original Model |
|---|---|---|---|---|---|
| 14M | 14M | 6 | 128 | 4 | pythia-14m |
| 31M | 31M | 6 | 256 | 8 | pythia-31m |
| 70M | 70M | 6 | 512 | 8 | pythia-70m |
| 160M | 160M | 12 | 768 | 12 | pythia-160m |
| 410M | 410M | 24 | 1024 | 16 | pythia-410m |
About PolyPythias
PolyPythias is an extension of the Pythia project providing 45 additional training runs across 5 model sizes with 9 different random seeds each. These models enable systematic study of training stability and reproducibility in language models. The 160M size also includes decoupled variants (data-seed and weight-seed) that isolate the effects of data ordering vs. weight initialization.
The complete collection is available at: EleutherAI/polypythias
Citation
@inproceedings{vanderwal2025polypythias,
title={PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs},
author={van der Wal, Oskar and Lesci, Pietro and Muller-Eberstein, Max and Saphra, Naomi and Schoelkopf, Hailey and Zuidema, Willem and Biderman, Stella},
booktitle={International Conference on Learning Representations},
year={2025},
url={https://arxiv.org/abs/2503.09543}
}