Update README.md
#1
by
sanchit-gandhi
- opened
README.md
CHANGED
@@ -51,23 +51,40 @@ music generation, or text to speech tasks.
|
|
51 |
|
52 |
## How to Get Started with the Model
|
53 |
|
54 |
-
Use the following code to get started with the EnCodec model:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
```python
|
57 |
-
import
|
58 |
-
from
|
|
|
59 |
|
60 |
-
#
|
61 |
-
|
62 |
|
63 |
-
#
|
64 |
-
|
|
|
65 |
|
66 |
-
#
|
67 |
-
|
|
|
68 |
|
69 |
-
#
|
70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
```
|
72 |
|
73 |
## Training Details
|
@@ -142,6 +159,7 @@ quality, particularly in applications where low latency is not critical (e.g., m
|
|
142 |
|
143 |
**BibTeX:**
|
144 |
|
|
|
145 |
@misc{défossez2022high,
|
146 |
title={High Fidelity Neural Audio Compression},
|
147 |
author={Alexandre Défossez and Jade Copet and Gabriel Synnaeve and Yossi Adi},
|
@@ -150,4 +168,4 @@ quality, particularly in applications where low latency is not critical (e.g., m
|
|
150 |
archivePrefix={arXiv},
|
151 |
primaryClass={eess.AS}
|
152 |
}
|
153 |
-
|
|
|
51 |
|
52 |
## How to Get Started with the Model
|
53 |
|
54 |
+
Use the following code to get started with the EnCodec model using a dummy example from the LibriSpeech dataset (~9MB). First, install the required Python packages:
|
55 |
+
|
56 |
+
```
|
57 |
+
pip install --upgrade pip
|
58 |
+
pip install --upgrade transformers datasets[audio]
|
59 |
+
```
|
60 |
+
|
61 |
+
Then load an audio sample, and run a forward pass of the model:
|
62 |
|
63 |
```python
|
64 |
+
from datasets import load_dataset, Audio
|
65 |
+
from transformers import EncodecModel, AutoProcessor
|
66 |
+
|
67 |
|
68 |
+
# load a demonstration datasets
|
69 |
+
librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
70 |
|
71 |
+
# load the model + processor (for pre-processing the audio)
|
72 |
+
model = EncodecModel.from_pretrained("facebook/encodec_24khz")
|
73 |
+
processor = AutoProcessor.from_pretrained("facebook/encodec_24khz")
|
74 |
|
75 |
+
# cast the audio data to the correct sampling rate for the model
|
76 |
+
librispeech_dummy = librispeech_dummy.cast_column("audio", Audio(sampling_rate=processor.sampling_rate))
|
77 |
+
audio_sample = librispeech_dummy[0]["audio"]["array"]
|
78 |
|
79 |
+
# pre-process the inputs
|
80 |
+
inputs = processor(raw_audio=audio_sample, sampling_rate=processor.sampling_rate, return_tensors="pt")
|
81 |
+
|
82 |
+
# explicitly encode then decode the audio inputs
|
83 |
+
encoder_outputs = model.encode(inputs["input_values"], inputs["padding_mask"])
|
84 |
+
audio_values = model.decode(encoder_outputs.audio_codes, encoder_outputs.audio_scales, inputs["padding_mask"])[0]
|
85 |
+
|
86 |
+
# or the equivalent with a forward pass
|
87 |
+
audio_values = model(inputs["input_values"], inputs["padding_mask"]).audio_values
|
88 |
```
|
89 |
|
90 |
## Training Details
|
|
|
159 |
|
160 |
**BibTeX:**
|
161 |
|
162 |
+
```
|
163 |
@misc{défossez2022high,
|
164 |
title={High Fidelity Neural Audio Compression},
|
165 |
author={Alexandre Défossez and Jade Copet and Gabriel Synnaeve and Yossi Adi},
|
|
|
168 |
archivePrefix={arXiv},
|
169 |
primaryClass={eess.AS}
|
170 |
}
|
171 |
+
```
|