metadata
{}
Note: Those are only the weights for the classifier trained on the whisper-small
embeddings.
Result of the classifier Rob's human-annotated dataset (data/voicemail_human_eval.csv
):
Results for chunk size 1 seconds:
- Accuracy: 0.7480
- Precision: 0.8681
- Recall: 0.7396
- F1 Score: 0.7987
Results for chunk size 2 seconds:
- Accuracy: 0.7880
- Precision: 0.9085
- Recall: 0.7633
- F1 Score: 0.8296
Results for chunk size 5 seconds:
- Accuracy: 0.8480
- Precision: 0.9456
- Recall: 0.8225
- F1 Score: 0.8797
Results for chunk size 10 seconds:
- Accuracy: 0.8720
- Precision: 0.9790
- Recall: 0.8284
- F1 Score: 0.8974
Results for full audio samples:
- Accuracy: 0.8760
- Precision: 0.9929
- Recall: 0.8225
- F1 Score: 0.8997