d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation
Paper
β’
2601.07568
β’
Published
β’
1
This repository contains d3LLM-LLaDA, an ultra-fast diffusion language model presented in the paper d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation.
d3LLM-LLaDA is an ultra-fast diffusion language model that strikes a balance between accuracy and parallelism. It uses pseudo-trajectory distillation to teach the model which tokens can be decoded confidently at early steps, and employs an entropy-based multi-block decoding mechanism with KV-cache refresh during inference.
To use this model, it is recommended to clone the official repository and install the required dependencies:
# Clone the repository
git clone https://github.com/hao-ai-lab/d3LLM.git
cd d3LLM
# Install dependencies
pip install -r requirements.txt
If you find d3LLM useful for your research, please cite the following work:
@article{arxiv'26:d3llm,
title = {d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation},
author = {Yu-Yang Qian and Junda Su and Lanxiang Hu and Peiyuan Zhang and Zhijie Deng and Peng Zhao and Hao Zhang},
journal = {ArXiv preprint},
volume = {arXiv:2601.07568},
year = {2026}
}