hance-ai
/

descript-audio-codec-16khz

Descript Audio Codec

Model card Files Files and versions Community

descript-audio-codec-16khz / README.md

dslee2601's picture

+dependency description

10401b1 6 months ago

|

history blame contribute delete

2.55 kB

	---
	license: mit
	tags:
	- DAC
	- Descript Audio Codec
	- PyTorch
	---

	# Descript Audio Codec (DAC)
	DAC is the state-of-the-art audio tokenizer with improvement upon the previous tokenizers like SoundStream and EnCodec.

	This model card provides an easy-to-use API for a pretrained DAC [1] for 16khz audio whose backbone and pretrained weights are from [its original reposotiry](https://github.com/descriptinc/descript-audio-codec). With this API, you can encode and decode by a single line of code either using CPU or GPU. Furhtermore, it supports chunk-based processing for memory-efficient processing, especially important for GPU processing.









	### Model variations
	There are three types of model depending on an input audio sampling rate.

	\| Model \| Input audio sampling rate [khz] \|
	\| ------------------ \| ----------------- \|
	\| [`hance-ai/descript-audio-codec-44khz`](https://huggingface.co/hance-ai/descript-audio-codec-44khz) \| 44.1khz \|
	\| [`hance-ai/descript-audio-codec-24khz`](https://huggingface.co/hance-ai/descript-audio-codec-24khz) \| 24khz \|
	\| [`hance-ai/descript-audio-codec-16khz`](https://huggingface.co/hance-ai/descript-audio-codec-16khz) \| 16khz \|




	# Dependency
	See `requirements.txt`.





	# Usage

	### Load
	```python
	from transformers import AutoModel

	# device setting
	device = 'cpu' # or 'cuda:0'

	# load
	model = AutoModel.from_pretrained('hance-ai/descript-audio-codec-16khz', trust_remote_code=True)
	model.to(device)
	```

	### Encode
	```python
	audio_filename = 'path/example_audio.wav'
	zq, s = model.encode(audio_filename)
	```
	`zq` is discrete embeddings with dimension of (1, num_RVQ_codebooks, token_length) and `s` is a token sequence with dimension of (1, num_RVQ_codebooks, token_length).


	### Decode
	```python
	# decoding from `zq`
	waveform = model.decode(zq=zq) # (1, 1, audio_length); the output has a mono channel.

	# decoding from `s`
	waveform = model.decode(s=s) # (1, 1, audio_length); the output has a mono channel.
	```

	### Save a waveform as an audio file
	```python
	model.waveform_to_audiofile(waveform, 'out.wav')
	```

	### Save and load tokens
	```python
	model.save_tensor(s, 'tokens.pt')
	loaded_s = model.load_tensor('tokens.pt')
	```










	# References
	[1] Kumar, Rithesh, et al. "High-fidelity audio compression with improved rvqgan." Advances in Neural Information Processing Systems 36 (2024).



	<!-- contributions
	- chunk processing
	- add device parameter in the test notebook
	-->