A newer version of the Gradio SDK is available:
5.24.0
ABX-based evaluation
ABX is used to evaluate the quality of the obtained discrete units.
The life cycle of the ABX-based evaluation for the Speech-to-Unit contains the following steps:
- Training an acoustic model (or use an existing acoustic model) (description)
- Perform quantization of speech by learning a K-means clustering model (description)
- Compute discrete features for ABX computation using the learned clusters
- Compute the ABX score over the discrete features taking advantage of libri-light's ABX evaluation script
Here we assume that you already went throught the first two steps and focus solely on extracting features and computing ABX scores.
Libri-light setup
Follow libri-light's instructions for installation and ABX evaluation setup (including the download of the data items required for ABX computation).
Computing ABX
Dumping quantized features
The first step for the ABX computation is to dump the quantized representations corresponding to the test files.
TYPE="hubert"
LAYER=6
CKPT_PATH="<PATH_TO_HUBERT_MODEL_CHECKPOINT_FILE>"
KM_MODEL_PATH="<PATH_TO_PRETRAINED_KM_MODEL_FILE>"
SUBSET="dev-clean"
MANIFEST="<PATH_TO_MANIFEST_FOR_LS_DEV-CLEAN>"
DATA_DIR="<PATH_TO_DIR_TO_STORE_FEATURES>/$SUBSET"
PYTHONPATH=. python examples/textless_nlp/gslm/metrics/abx_metrics/dump_abx_feats.py \
--feature_type $TYPE \
--kmeans_model_path $KM_MODEL_PATH \
--checkpoint_path $CKPT_PATH \
--layer $LAYER \
--manifest_path $MANIFEST \
--out_dir_path $DATA_DIR \
--extension ".flac"
Again the manifest file follows the same structure than elsewhere in the codebase.
Compute ABX with Libri-light
Use libri-light's eval_ABX.py
script (within the appropriate environment set up) as followed:
LIBRILIGHT_ROOT="<PATH_TO_LIBRILIGHT>"
SUBSET="dev-clean"
DATA_DIR="<PATH_TO_DIR_TO_STORE_FEATURES>/$SUBSET"
ITEM_FILE_PATH="$LIBRILIGHT_ROOT/eval/ABX_data/$SUBSET.item"
OUT_DIR="<PATH_TO_DIR_TO_STORE_ABX_SCORES>/$SUBSET"
FILE_EXTENSION=".npy"
FEATURE_SIZE=0.02 # depends on the model used
PYTHONPATH=$LIBRILIGHT_ROOT \
python $LIBRILIGHT_ROOT/eval/eval_ABX.py \
$DATA_DIR \
$ITEM_FILE_PATH \
--file_extension $FILE_EXTENSION \
--feature_size $FEATURE_SIZE \
--out $OUT_DIR \
--mode "all"
Note that FEATURE_SIZE
will depend on the model type you are using to extract the acoustic features:
- For HuBERT and Wav2Vec2.0, use
FEATURE_SIZE=0.02
- For CPC and Log Mel, use
FEATURE_SIZE=0.01
If you have a gpu available, make sure you add the --cuda
flag for faster computation.