NeMo
CasanovaE commited on
Commit
b61673d
1 Parent(s): d736408

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -54,6 +54,38 @@ The model is available for use in the [NVIDIA NeMo](https://github.com/NVIDIA/Ne
54
  ### Inference
55
  For inference, you can follow our [Audio Codec Inference Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Inference.ipynb) which automatically downloads the model checkpoint. Note that you will need to set the ```model_name``` parameter to "nvidia/low-frame-rate-speech-codec-22khz".
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  Alternatively, you can manually download the [checkpoint](https://huggingface.co/nvidia/low-frame-rate-speech-codec-22khz/resolve/main/low-frame-rate-speech-codec-22khz.nemo) and use the code below to make an inference on the model:
58
 
59
  ```
@@ -68,6 +100,7 @@ path_to_output_audio = ??? # path of the reconstructed output audio
68
 
69
  nemo_codec_model = AudioCodecModel.restore_from(restore_path=codec_path, map_location="cpu").eval()
70
 
 
71
  # get discrete tokens from audio
72
  audio, _ = librosa.load(path_to_input_audio, sr=nemo_codec_model.sample_rate)
73
 
 
54
  ### Inference
55
  For inference, you can follow our [Audio Codec Inference Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Inference.ipynb) which automatically downloads the model checkpoint. Note that you will need to set the ```model_name``` parameter to "nvidia/low-frame-rate-speech-codec-22khz".
56
 
57
+ In addition, you can use the code bellow that automatically download the checkpoint as well:
58
+
59
+ ```
60
+ import librosa
61
+ import torch
62
+ import soundfile as sf
63
+ from nemo.collections.tts.models import AudioCodecModel
64
+
65
+ path_to_input_audio = ??? # path of the input audio
66
+ path_to_output_audio = ??? # path of the reconstructed output audio
67
+
68
+ # load audio codec
69
+ nemo_codec_model = AudioCodecModel.from_pretrained("nvidia/low-frame-rate-speech-codec-22khz").eval()
70
+
71
+ # get discrete tokens from audio
72
+ audio, _ = librosa.load(path_to_input_audio, sr=nemo_codec_model.sample_rate)
73
+
74
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
75
+ audio_tensor = torch.from_numpy(audio).unsqueeze(dim=0).to(device)
76
+ audio_len = torch.tensor([audio_tensor[0].shape[0]]).to(device)
77
+
78
+ encoded_tokens, encoded_len = nemo_codec_model.encode(audio=audio_tensor, audio_len=audio_len)
79
+
80
+ # Reconstruct audio from tokens
81
+ reconstructed_audio, _ = nemo_codec_model.decode(tokens=encoded_tokens, tokens_len=encoded_len)
82
+
83
+ # save reconstructed audio
84
+ output_audio = reconstructed_audio.cpu().numpy().squeeze()
85
+ sf.write(path_to_output_audio, output_audio, nemo_codec_model.sample_rate)
86
+
87
+ ```
88
+
89
  Alternatively, you can manually download the [checkpoint](https://huggingface.co/nvidia/low-frame-rate-speech-codec-22khz/resolve/main/low-frame-rate-speech-codec-22khz.nemo) and use the code below to make an inference on the model:
90
 
91
  ```
 
100
 
101
  nemo_codec_model = AudioCodecModel.restore_from(restore_path=codec_path, map_location="cpu").eval()
102
 
103
+ nemo_codec_model = AudioCodecModel.from_pretrained("nvidia/low-frame-rate-speech-codec-22khz").eval()
104
  # get discrete tokens from audio
105
  audio, _ = librosa.load(path_to_input_audio, sr=nemo_codec_model.sample_rate)
106