PyTorch
English
causal-lm
pythia
polypythias
gpt-neox

Pythia-31M-seed4 GPT-NeoX Checkpoints

This repository contains the raw GPT-NeoX training checkpoints for Pythia-31M-seed4, part of the PolyPythias suite. These are the native checkpoint files produced during training, stored in DeepSpeed's checkpoint format.

If you want to perform inference, use the HuggingFace Transformers-compatible weights at EleutherAI/pythia-31m-seed4 instead. This repository is intended for research that requires access to optimizer states or the original training format.

Contents

Each branch contains a full training checkpoint at a given step, including:

  • layer_XX-model_00-model_states.pt β€” model weight shards (one per layer)
  • mp_rank_00_model_states.pt β€” model state metadata
  • zero_pp_rank_*_optim_states.pt β€” ZeRO optimizer states (Adam moments, etc.)
  • 31M.yml β€” GPT-NeoX training configuration

Branches

154 checkpoints are available as branches:

  • step0 β€” initialization
  • step{1,2,4,8,16,32,64,128,256,512} β€” log-spaced early checkpoints
  • step1000 through step143000 β€” every 1,000 steps

Branch step143000 corresponds to the final model.

Converting to HuggingFace Format

To convert a checkpoint to HuggingFace Transformers format, use the conversion script from GPT-NeoX:

python tools/convert_neox_to_hf.py \
    --input_dir /path/to/neox/checkpoint \
    --config_file /path/to/config.yml \
    --output_dir /path/to/hf/output

Pre-converted weights for all checkpoints are available at EleutherAI/pythia-31m-seed4.

Training Details

Trained on the Pile using a pre-shuffled data ordering specific to this seed. The shuffled index files are available at EleutherAI/pile-preshuffled-seeds.

All PolyPythias models were trained for 143,000 steps with a batch size of 2M tokens (2,097,152 tokens per step), seeing a total of 299,892,736,000 tokens. See the PolyPythias paper and Pythia GitHub repository for full training details.

Model Size Parameters Layers Model Dim Heads Original Model
14M 14M 6 128 4 pythia-14m
31M 31M 6 256 8 pythia-31m
70M 70M 6 512 8 pythia-70m
160M 160M 12 768 12 pythia-160m
410M 410M 24 1024 16 pythia-410m

About PolyPythias

PolyPythias is an extension of the Pythia project providing 45 additional training runs across 5 model sizes with 9 different random seeds each. These models enable systematic study of training stability and reproducibility in language models. The 160M size also includes decoupled variants (data-seed and weight-seed) that isolate the effects of data ordering vs. weight initialization.

The complete collection is available at: EleutherAI/polypythias

Citation

@inproceedings{vanderwal2025polypythias,
    title={PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs},
    author={van der Wal, Oskar and Lesci, Pietro and Muller-Eberstein, Max and Saphra, Naomi and Schoelkopf, Hailey and Zuidema, Willem and Biderman, Stella},
    booktitle={International Conference on Learning Representations},
    year={2025},
    url={https://arxiv.org/abs/2503.09543}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train EleutherAI/neox-ckpt-pythia-31m-seed4

Paper for EleutherAI/neox-ckpt-pythia-31m-seed4