|
--- |
|
license: mit |
|
library: ONNX |
|
base_model: charactr/vocos-mel-24khz |
|
--- |
|
|
|
**Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis** |
|
|
|
**Audio samples | Paper [abs] [pdf]** |
|
|
|
Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform. |
|
|
|
This is a ONNX version of the original 24khz mel spectrogram [model](https://huggingface.co/charactr/vocos-mel-24khz). The model predicts spectrograms and the ISTFT is performed outside ONNX as ISTFT is still not implemented as an operator in ONNX. |
|
|
|
## Usage |
|
|
|
Try out in colab: |
|
|
|
<a target="_blank" href="https://colab.research.google.com/drive/1J1tWd56D7CPwmVCP-pbMNzlRWYvlyADN"> |
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
|
</a> |
|
|
|
## Citation |
|
|
|
``` |
|
@article{siuzdak2023vocos, |
|
title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis}, |
|
author={Siuzdak, Hubert}, |
|
journal={arXiv preprint arXiv:2306.00814}, |
|
year={2023} |
|
} |
|
|
|
``` |