ylacombe commited on
Commit
b1c0224
·
verified ·
1 Parent(s): 86ce25f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -3
README.md CHANGED
@@ -8,7 +8,27 @@ license: mit
8
 
9
  # Descript Audio Codec (.dac): High-Fidelity Audio Compression with Improved RVQGAN
10
 
11
- This repository contains training and inference scripts for the Descript Audio Codec (.dac), a high fidelity general neural audio codec, introduced in the paper titled **High-Fidelity Audio Compression with Improved RVQGAN**.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  [arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN
14
  ](http://arxiv.org/abs/2306.06546) <br>
@@ -21,8 +41,6 @@ This repository contains training and inference scripts for the Descript Audio C
21
  👌 It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br>
22
 
23
 
24
- ## Original Usage
25
-
26
  ### Installation
27
  ```
28
  pip install descript-audio-codec
 
8
 
9
  # Descript Audio Codec (.dac): High-Fidelity Audio Compression with Improved RVQGAN
10
 
11
+ This repository is a wrapper around the original **Descript Audio Codec** model, a high fidelity general neural audio codec, introduced in the paper titled **High-Fidelity Audio Compression with Improved RVQGAN**.
12
+
13
+ It is designed to be used as a drop-in replacement of the [transformers implementation](https://huggingface.co/docs/transformers/v4.39.3/en/model_doc/encodec#overview) of [Encodec](https://github.com/facebookresearch/encodec), so that architectures that use Encodec can also be trained with DAC instead.
14
+ The [Parler-TTS library](https://github.com/huggingface/parler-tts) is an example of how to use DAC to train high-quality TTS models. We released [Parler-TTS Mini v0.1]("https://huggingface.co/parler-tts/parler_tts_300M_v0.1"), a first iteration model trained using 10k hours of narrated audiobooks. It generates high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation)
15
+
16
+ To use this checkpoint, you first need to install the [Parler-TTS library](https://github.com/huggingface/parler-tts) with (to do once):
17
+ ```sh
18
+ pip install git+https://github.com/huggingface/parler-tts.git
19
+ ```
20
+
21
+ And then use:
22
+ ```python
23
+ from parler_tts import DACModel
24
+ dac_model = DACModel.from_pretrained("parler-tts/dac_44khZ_8kbps")
25
+ ```
26
+
27
+
28
+ 🚨 If you want to use the original DAC codebase, refers to the [original repository](https://github.com/descriptinc/descript-audio-codec/tree/main) or to the [Original Usage](#original-usage) section.
29
+
30
+
31
+ ## Original Usage
32
 
33
  [arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN
34
  ](http://arxiv.org/abs/2306.06546) <br>
 
41
  👌 It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br>
42
 
43
 
 
 
44
  ### Installation
45
  ```
46
  pip install descript-audio-codec