marriola's picture
arxiv paper
7e1f98c verified
---
library_name: transformers
pipeline_tag: text-generation
license: apache-2.0
tags:
- bd3lm
- diffusion
- autoregressive
- language-modeling
---
# Block Diffusion Interpolates Between Autoregressive and Diffusion Language Models (ICLR 2025 Oral)
By [Marianne Arriola](https://m-arriola.com/), [Aaron Gokaslan](https://skylion007.github.io), [Justin T Chiu](https://justinchiu.netlify.app), [Zhihan Yang](https://zhihanyang2022.github.io/), [Zhixuan Qi](https://zhixuanqi.com/), [Jiaqi Han](https://hanjq17.github.io/), [Subham Sekhar Sahoo](https://s-sahoo.github.io), [Volodymyr Kuleshov](https://www.cs.cornell.edu/~kuleshov/)
[![Paper](https://img.shields.io/badge/Paper_πŸ“ƒ-green)](https://arxiv.org/abs/2503.09573)
[![GitHub](https://img.shields.io/badge/GitHub_πŸ§‘β€πŸ’»-blue)](https://github.com/kuleshov-group/bd3lms)
[![Blog](https://img.shields.io/badge/Blog_πŸ“%20%20-8A2BE2)](https://m-arriola.com/bd3lms/)
[![HuggingFace](https://img.shields.io/badge/HuggingFace_πŸ€—%20-BD3LMs%20-orange)](https://huggingface.co/collections/kuleshov-group/bd3-lms-67be95f81b96b15fec50d53f)
We introduce ***BD3-LMs***, a family of **B**lock **D**iscrete **D**enoising **D**iffusion **L**anguage **M**odels that achieve SOTA likelihoods among diffusion models and enable generation of arbitrary-length sequences. BD3-LMs combine the strengths of autoregressive and diffusion language models by decomposing a token sequence into blocks and performing discrete diffusion within each block. By tuning the block size, we interpolate between autoregressive and diffusion models which introduces a trade-off between quality and sample efficiency. We propose a recipe of building effective BD3-LMs that includes an efficient training algorithm, estimators of gradient variance, and data-driven noise schedules to minimize the variance.
## Model Description
BD3-LMs are Block Discrete Denoising Diffusion Language Models. They combine the strengths of autoregressive and diffusion language models by decomposing a token sequence into blocks and performing discrete diffusion within each block.
## How to use
See our [GitHub README](https://github.com/kuleshov-group/bd3lms), where we provide sample scripts for training, likelihood evaluation, and generation.
## Citation
```
@inproceedings{
arriola2025block,
title={Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models},
author={Marianne Arriola and Aaron Gokaslan and Justin T Chiu and Zhihan Yang and Zhixuan Qi and Jiaqi Han and Subham Sekhar Sahoo and Volodymyr Kuleshov},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://arxiv.org/abs/2503.09573}
}
```