mshukor
init
26fd00c

A newer version of the Gradio SDK is available: 5.23.1

Upgrade

Speech to Unit Model (speech2unit)

Acoustic Model

For quantizing speech we learn a K-means clustering over acoustic representations for which we either use Log-Mel Filterbank or pretrained acoustic representation models. For using pretrained models, please download from their respective locations linked below.

Quantization Model

You can download pretrained quantized model from the list below.

K-Means Model Download Link
Log Mel Filterbank + KM50 download
Log Mel Filterbank + KM100 download
Log Mel Filterbank + KM200 download
Log Mel Filterbank + KM500 download
Modified CPC + KM50 download
Modified CPC + KM100 download
Modified CPC + KM200 download
Modified CPC + KM500 download
HuBERT Base + KM50 download
HuBERT Base + KM100 download
HuBERT Base + KM200 download
HuBERT Base + KM500 download
wav2vec 2.0 Large + KM50 download
wav2vec 2.0 Large + KM100 download
wav2vec 2.0 Large + KM200 download
wav2vec 2.0 Large + KM500 download

Quantization

For quantizing speech with a given acoustic representation, please follow the steps below.

  1. Learn K-means clustering model
N_CLUSTERS=<number_of_clusters_used_for_kmeans>
TYPE=<one_of_logmel/cpc/hubert/w2v2>
CKPT_PATH=<path_of_pretrained_acoustic_model>
LAYER=<layer_of_acoustic_model_to_extract_features_from>
MANIFEST=<tab_separated_manifest_of_audio_files_for_training_kmeans>
KM_MODEL_PATH=<output_path_of_the_kmeans_model>

PYTHONPATH=. python examples/textless_nlp/gslm/speech2unit/clustering/cluster_kmeans.py \
    --num_clusters $N_CLUSTERS \
    --feature_type $TYPE \
    --checkpoint_path $CKPT_PATH \
    --layer $LAYER \
    --manifest_path $MANIFEST \
    --out_kmeans_model_path $KM_MODEL_PATH
  1. Quantize using the learned clusters
MANIFEST=<tab_separated_manifest_of_audio_files_to_quantize>
OUT_QUANTIZED_FILE=<output_quantized_audio_file_path>

python examples/textless_nlp/gslm/speech2unit/clustering/del/quantize_with_kmeans.py \
    --feature_type $TYPE \
    --kmeans_model_path $KM_MODEL_PATH \
    --checkpoint_path $CKPT_PATH \
    --layer $LAYER \
    --manifest_path $MANIFEST \
    --out_quantized_file_path $OUT_QUANTIZED_FILE \
    --extension ".flac"

Note about the manifest file is a file with paths and length of input audio files. The format of the file is as follows:

<path_of_root_directory_containing_audio_files>
<relative_path_of_audio_file_1>\t<number_of_frames_1>
<relative_path_of_audio_file_2>\t<number_of_frames_1>
...