This model is a finetuned whisper-small model with 1M audio samples from the dataset mitermix/audiosnippets