|
--- |
|
license: mit |
|
datasets: |
|
- mozilla-foundation/common_voice_13_0 |
|
language: |
|
- ca |
|
- ta |
|
- th |
|
tags: |
|
- automatic-speech-recognition |
|
inference: false |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
|
|
## About |
|
|
|
Multilingual Distilwhisper allows for better ASR performance in target languages by adding lightweight CLSR modules on top of whisper-small. |
|
These modules are trained on a mix of cross-entropy (ASR) and knowledge distillation losses, where whisper-large-v2 is used as teacher. |
|
More details in the ICASSP 2024 paper: arxiv.org/abs/2311.01070 |
|
|
|
## Inference |
|
|
|
Code for training and inference at: https://github.com/naver/multilingual-distilwhisper |
|
|
|
## Citation |
|
``` |
|
@inproceedings{ferraz2024distilwhisper, |
|
title={Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts}, |
|
author={Ferraz, Thomas Palmeira and Boito, Marcely Zanon and Brun, Caroline and Nikoulina, Vassilina}, |
|
booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, |
|
year={2024}, |
|
organization={IEEE} |
|
} |
|
``` |