wetdog
/

vocos-mel-24khz-onnx

Model card Files Files and versions Community

vocos-mel-24khz-onnx / README.md

wetdog's picture

Update README.md

0c0dc6e verified 10 months ago

|

history blame contribute delete

1.39 kB

	---
	license: mit
	library: ONNX
	base_model: charactr/vocos-mel-24khz
	---

	Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

	Audio samples \| Paper [abs] [pdf]

	Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.

	This is a ONNX version of the original 24khz mel spectrogram [model](https://huggingface.co/charactr/vocos-mel-24khz). The model predicts spectrograms and the ISTFT is performed outside ONNX as ISTFT is still not implemented as an operator in ONNX.

	## Usage

	Try out in colab:

	<a target="_blank" href="https://colab.research.google.com/drive/1J1tWd56D7CPwmVCP-pbMNzlRWYvlyADN">
	<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
	</a>

	## Citation

	```
	@article{siuzdak2023vocos,
	title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
	author={Siuzdak, Hubert},
	journal={arXiv preprint arXiv:2306.00814},
	year={2023}
	}

	```