Bai-YT
/

ConsistencyTTA

Model card Files Files and versions Community

Bai-YT commited on Feb 8, 2024

Commit

5a0dcf3

·

verified ·

1 Parent(s): 87f0615

Update README.md

Files changed (1) hide show

README.md +41 -0

README.md CHANGED Viewed

@@ -1,3 +1,44 @@
 ---
 license: cc-by-nc-nd-4.0
 ---

 ---
 license: cc-by-nc-nd-4.0
 ---
+# ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
+This page shares the official model checkpoints of the paper \
+"Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation" \
+from Microsoft Applied Science Group and UC Berkeley \
+by [Yatong Bai](https://bai-yt.github.io),
+[Trung Dang](https://www.microsoft.com/applied-sciences/people/trung-dang),
+[Dung Tran](https://www.microsoft.com/applied-sciences/people/dung-tran),
+[Kazuhito Koishida](https://www.microsoft.com/applied-sciences/people/kazuhito-koishida),
+and [Somayeh Sojoudi](https://people.eecs.berkeley.edu/~sojoudi/).
+**[[Preprint Paper](https://arxiv.org/abs/2309.10740)]** &nbsp;&nbsp;&nbsp;&nbsp;
+**[[Project Homepage](https://consistency-tta.github.io)]** &nbsp;&nbsp;&nbsp;&nbsp;
+**[[Code](https://github.com/Bai-YT/ConsistencyTTA)]** &nbsp;&nbsp;&nbsp;&nbsp;
+**[[Model Checkpoints](https://huggingface.co/Bai-YT/ConsistencyTTA)]** &nbsp;&nbsp;&nbsp;&nbsp;
+**[[Generation Examples](https://consistency-tta.github.io/demo.html)]**
+## Description
+This work proposes a *consistency distillation* framework to train
+text-to-audio (TTA) generation models that only require a single neural network query,
+reducing the computation of the core step of diffusion-based TTA models by a factor of 400.
+By incorporating *classifier-free guidance* into the distillation framework,
+our models retain diffusion models' impressive generation quality and diversity.
+Furthermore, the non-recurrent differentiable structure of the consistency model
+allows for end-to-end fine-tuning with novel loss functions such as the CLAP score, further boosting performance.
+## Model Details
+We share three model checkpoints:
+- ConsistencyTTA directly distilled from a diffusion model;
+- The above ConsistencyTTA model fine-tuned by optimizing the CLAP score;
+- The diffusion teacher model from which ConsistencyTTA is distilled.
+These model checkpoints can be found on our [Huggingface page](https://huggingface.co/Bai-YT/ConsistencyTTA).
+After downloading and unzipping the files, place them in the `saved` directory.
+Please refer to our [GitHub page](https://github.com/Bai-YT/ConsistencyTTA) for usage details.