kuleshov-group
/

bd3lm-owt-block_size4

Text Generation

language-modeling

Model card Files Files and versions Community

bd3lm-owt-block_size4 / README.md

marriola's picture

arxiv paper

7e1f98c verified 3 days ago

|

history blame contribute delete

2.7 kB

	---
	library_name: transformers
	pipeline_tag: text-generation
	license: apache-2.0
	tags:
	- bd3lm
	- diffusion
	- autoregressive
	- language-modeling
	---

	# Block Diffusion Interpolates Between Autoregressive and Diffusion Language Models (ICLR 2025 Oral)

	By [Marianne Arriola](https://m-arriola.com/), [Aaron Gokaslan](https://skylion007.github.io), [Justin T Chiu](https://justinchiu.netlify.app), [Zhihan Yang](https://zhihanyang2022.github.io/), [Zhixuan Qi](https://zhixuanqi.com/), [Jiaqi Han](https://hanjq17.github.io/), [Subham Sekhar Sahoo](https://s-sahoo.github.io), [Volodymyr Kuleshov](https://www.cs.cornell.edu/~kuleshov/)

	[![Paper](https://img.shields.io/badge/Paper_📃-green)](https://arxiv.org/abs/2503.09573)
	[![GitHub](https://img.shields.io/badge/GitHub_🧑‍💻-blue)](https://github.com/kuleshov-group/bd3lms)
	[![Blog](https://img.shields.io/badge/Blog_📝%20%20-8A2BE2)](https://m-arriola.com/bd3lms/)
	[![HuggingFace](https://img.shields.io/badge/HuggingFace_🤗%20-BD3LMs%20-orange)](https://huggingface.co/collections/kuleshov-group/bd3-lms-67be95f81b96b15fec50d53f)


	We introduce *BD3-LMs, a family of Block Discrete Denoising Diffusion Language M*odels that achieve SOTA likelihoods among diffusion models and enable generation of arbitrary-length sequences. BD3-LMs combine the strengths of autoregressive and diffusion language models by decomposing a token sequence into blocks and performing discrete diffusion within each block. By tuning the block size, we interpolate between autoregressive and diffusion models which introduces a trade-off between quality and sample efficiency. We propose a recipe of building effective BD3-LMs that includes an efficient training algorithm, estimators of gradient variance, and data-driven noise schedules to minimize the variance.

	## Model Description
	BD3-LMs are Block Discrete Denoising Diffusion Language Models. They combine the strengths of autoregressive and diffusion language models by decomposing a token sequence into blocks and performing discrete diffusion within each block.

	## How to use
	See our [GitHub README](https://github.com/kuleshov-group/bd3lms), where we provide sample scripts for training, likelihood evaluation, and generation.

	## Citation
	```
	@inproceedings{
	arriola2025block,
	title={Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models},
	author={Marianne Arriola and Aaron Gokaslan and Justin T Chiu and Zhihan Yang and Zhixuan Qi and Jiaqi Han and Subham Sekhar Sahoo and Volodymyr Kuleshov},
	booktitle={The Thirteenth International Conference on Learning Representations},
	year={2025},
	url={https://arxiv.org/abs/2503.09573}
	}
	```