PyTorch
hubertsiuzdak reach-vb HF staff commited on
Commit
0feb3fd
1 Parent(s): a91e656

Update README.md (#1)

Browse files

- Update README.md (b62e72bf0928bcf6c7ad418a4204007f0d8d7b1d)


Co-authored-by: Vaibhav Srivastav <[email protected]>

Files changed (1) hide show
  1. README.md +68 -0
README.md CHANGED
@@ -1,3 +1,71 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ # Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
6
+
7
+ [Audio samples](https://charactr-platform.github.io/vocos/) |
8
+ Paper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)
9
+
10
+ Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative
11
+ Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical
12
+ GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral
13
+ coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.
14
+
15
+ ## Installation
16
+
17
+ To use Vocos only in inference mode, install it using:
18
+
19
+ ```bash
20
+ pip install vocos
21
+ ```
22
+
23
+ If you wish to train the model, install it with additional dependencies:
24
+
25
+ ```bash
26
+ pip install vocos[train]
27
+ ```
28
+
29
+ ## Usage
30
+
31
+ ### Reconstruct audio from mel-spectrogram
32
+
33
+ ```python
34
+ import torch
35
+
36
+ from vocos import Vocos
37
+
38
+ vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz")
39
+
40
+ mel = torch.randn(1, 100, 256) # B, C, T
41
+ audio = vocos.decode(mel)
42
+ ```
43
+
44
+ Copy-synthesis from a file:
45
+
46
+ ```python
47
+ import torchaudio
48
+
49
+ y, sr = torchaudio.load(YOUR_AUDIO_FILE)
50
+ if y.size(0) > 1: # mix to mono
51
+ y = y.mean(dim=0, keepdim=True)
52
+ y = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000)
53
+ y_hat = vocos(y)
54
+ ```
55
+
56
+ ## Citation
57
+
58
+ If this code contributes to your research, please cite our work:
59
+
60
+ ```
61
+ @article{siuzdak2023vocos,
62
+ title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
63
+ author={Siuzdak, Hubert},
64
+ journal={arXiv preprint arXiv:2306.00814},
65
+ year={2023}
66
+ }
67
+ ```
68
+
69
+ ## License
70
+
71
+ The code in this repository is released under the MIT license.