Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,27 @@ license: mit
|
|
8 |
|
9 |
# Descript Audio Codec (.dac): High-Fidelity Audio Compression with Improved RVQGAN
|
10 |
|
11 |
-
This repository
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
[arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN
|
14 |
](http://arxiv.org/abs/2306.06546) <br>
|
@@ -21,8 +41,6 @@ This repository contains training and inference scripts for the Descript Audio C
|
|
21 |
👌 It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br>
|
22 |
|
23 |
|
24 |
-
## Original Usage
|
25 |
-
|
26 |
### Installation
|
27 |
```
|
28 |
pip install descript-audio-codec
|
|
|
8 |
|
9 |
# Descript Audio Codec (.dac): High-Fidelity Audio Compression with Improved RVQGAN
|
10 |
|
11 |
+
This repository is a wrapper around the original **Descript Audio Codec** model, a high fidelity general neural audio codec, introduced in the paper titled **High-Fidelity Audio Compression with Improved RVQGAN**.
|
12 |
+
|
13 |
+
It is designed to be used as a drop-in replacement of the [transformers implementation](https://huggingface.co/docs/transformers/v4.39.3/en/model_doc/encodec#overview) of [Encodec](https://github.com/facebookresearch/encodec), so that architectures that use Encodec can also be trained with DAC instead.
|
14 |
+
The [Parler-TTS library](https://github.com/huggingface/parler-tts) is an example of how to use DAC to train high-quality TTS models. We released [Parler-TTS Mini v0.1]("https://huggingface.co/parler-tts/parler_tts_300M_v0.1"), a first iteration model trained using 10k hours of narrated audiobooks. It generates high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation)
|
15 |
+
|
16 |
+
To use this checkpoint, you first need to install the [Parler-TTS library](https://github.com/huggingface/parler-tts) with (to do once):
|
17 |
+
```sh
|
18 |
+
pip install git+https://github.com/huggingface/parler-tts.git
|
19 |
+
```
|
20 |
+
|
21 |
+
And then use:
|
22 |
+
```python
|
23 |
+
from parler_tts import DACModel
|
24 |
+
dac_model = DACModel.from_pretrained("parler-tts/dac_44khZ_8kbps")
|
25 |
+
```
|
26 |
+
|
27 |
+
|
28 |
+
🚨 If you want to use the original DAC codebase, refers to the [original repository](https://github.com/descriptinc/descript-audio-codec/tree/main) or to the [Original Usage](#original-usage) section.
|
29 |
+
|
30 |
+
|
31 |
+
## Original Usage
|
32 |
|
33 |
[arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN
|
34 |
](http://arxiv.org/abs/2306.06546) <br>
|
|
|
41 |
👌 It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br>
|
42 |
|
43 |
|
|
|
|
|
44 |
### Installation
|
45 |
```
|
46 |
pip install descript-audio-codec
|