Bai-YT commited on
Commit
5a0dcf3
·
verified ·
1 Parent(s): 87f0615

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md CHANGED
@@ -1,3 +1,44 @@
1
  ---
2
  license: cc-by-nc-nd-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-nd-4.0
3
  ---
4
+
5
+ # ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
6
+
7
+ This page shares the official model checkpoints of the paper \
8
+ "Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation" \
9
+ from Microsoft Applied Science Group and UC Berkeley \
10
+ by [Yatong Bai](https://bai-yt.github.io),
11
+ [Trung Dang](https://www.microsoft.com/applied-sciences/people/trung-dang),
12
+ [Dung Tran](https://www.microsoft.com/applied-sciences/people/dung-tran),
13
+ [Kazuhito Koishida](https://www.microsoft.com/applied-sciences/people/kazuhito-koishida),
14
+ and [Somayeh Sojoudi](https://people.eecs.berkeley.edu/~sojoudi/).
15
+
16
+ **[[Preprint Paper](https://arxiv.org/abs/2309.10740)]**     
17
+ **[[Project Homepage](https://consistency-tta.github.io)]**     
18
+ **[[Code](https://github.com/Bai-YT/ConsistencyTTA)]**     
19
+ **[[Model Checkpoints](https://huggingface.co/Bai-YT/ConsistencyTTA)]**     
20
+ **[[Generation Examples](https://consistency-tta.github.io/demo.html)]**
21
+
22
+
23
+ ## Description
24
+
25
+ This work proposes a *consistency distillation* framework to train
26
+ text-to-audio (TTA) generation models that only require a single neural network query,
27
+ reducing the computation of the core step of diffusion-based TTA models by a factor of 400.
28
+ By incorporating *classifier-free guidance* into the distillation framework,
29
+ our models retain diffusion models' impressive generation quality and diversity.
30
+ Furthermore, the non-recurrent differentiable structure of the consistency model
31
+ allows for end-to-end fine-tuning with novel loss functions such as the CLAP score, further boosting performance.
32
+
33
+
34
+ ## Model Details
35
+
36
+ We share three model checkpoints:
37
+ - ConsistencyTTA directly distilled from a diffusion model;
38
+ - The above ConsistencyTTA model fine-tuned by optimizing the CLAP score;
39
+ - The diffusion teacher model from which ConsistencyTTA is distilled.
40
+
41
+ These model checkpoints can be found on our [Huggingface page](https://huggingface.co/Bai-YT/ConsistencyTTA).
42
+ After downloading and unzipping the files, place them in the `saved` directory.
43
+
44
+ Please refer to our [GitHub page](https://github.com/Bai-YT/ConsistencyTTA) for usage details.