Files changed (1) hide show
  1. README.md +30 -12
README.md CHANGED
@@ -51,23 +51,40 @@ music generation, or text to speech tasks.
51
 
52
  ## How to Get Started with the Model
53
 
54
- Use the following code to get started with the EnCodec model:
 
 
 
 
 
 
 
55
 
56
  ```python
57
- import torch
58
- from encodec import EnCodecModel
 
59
 
60
- # Load the pre-trained EnCodec model
61
- model = EnCodecModel()
62
 
63
- # Load the audio data
64
- audio_data = torch.load('audio.pt')
 
65
 
66
- # Compress the audio
67
- audio_codes = model.encode(audio_data)[0]
 
68
 
69
- # Decompress the audio
70
- reconstructed_audio = model.decode(audio_codes)
 
 
 
 
 
 
 
71
  ```
72
 
73
  ## Training Details
@@ -142,6 +159,7 @@ quality, particularly in applications where low latency is not critical (e.g., m
142
 
143
  **BibTeX:**
144
 
 
145
  @misc{défossez2022high,
146
  title={High Fidelity Neural Audio Compression},
147
  author={Alexandre Défossez and Jade Copet and Gabriel Synnaeve and Yossi Adi},
@@ -150,4 +168,4 @@ quality, particularly in applications where low latency is not critical (e.g., m
150
  archivePrefix={arXiv},
151
  primaryClass={eess.AS}
152
  }
153
-
 
51
 
52
  ## How to Get Started with the Model
53
 
54
+ Use the following code to get started with the EnCodec model using a dummy example from the LibriSpeech dataset (~9MB). First, install the required Python packages:
55
+
56
+ ```
57
+ pip install --upgrade pip
58
+ pip install --upgrade transformers datasets[audio]
59
+ ```
60
+
61
+ Then load an audio sample, and run a forward pass of the model:
62
 
63
  ```python
64
+ from datasets import load_dataset, Audio
65
+ from transformers import EncodecModel, AutoProcessor
66
+
67
 
68
+ # load a demonstration datasets
69
+ librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
70
 
71
+ # load the model + processor (for pre-processing the audio)
72
+ model = EncodecModel.from_pretrained("facebook/encodec_24khz")
73
+ processor = AutoProcessor.from_pretrained("facebook/encodec_24khz")
74
 
75
+ # cast the audio data to the correct sampling rate for the model
76
+ librispeech_dummy = librispeech_dummy.cast_column("audio", Audio(sampling_rate=processor.sampling_rate))
77
+ audio_sample = librispeech_dummy[0]["audio"]["array"]
78
 
79
+ # pre-process the inputs
80
+ inputs = processor(raw_audio=audio_sample, sampling_rate=processor.sampling_rate, return_tensors="pt")
81
+
82
+ # explicitly encode then decode the audio inputs
83
+ encoder_outputs = model.encode(inputs["input_values"], inputs["padding_mask"])
84
+ audio_values = model.decode(encoder_outputs.audio_codes, encoder_outputs.audio_scales, inputs["padding_mask"])[0]
85
+
86
+ # or the equivalent with a forward pass
87
+ audio_values = model(inputs["input_values"], inputs["padding_mask"]).audio_values
88
  ```
89
 
90
  ## Training Details
 
159
 
160
  **BibTeX:**
161
 
162
+ ```
163
  @misc{défossez2022high,
164
  title={High Fidelity Neural Audio Compression},
165
  author={Alexandre Défossez and Jade Copet and Gabriel Synnaeve and Yossi Adi},
 
168
  archivePrefix={arXiv},
169
  primaryClass={eess.AS}
170
  }
171
+ ```