|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
metrics: |
|
- cer |
|
- wer |
|
library_name: transformers |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
|
|
# Model |
|
This model is [Wav2Vec2-Large-XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) |
|
fine-tuned on the manually annotated subset of |
|
CMU's [L2-Arctic dataset](https://psi.engr.tamu.edu/l2-arctic-corpus/). It was fine-tuned |
|
to perform automatic phonetic transcriptions in IPA. |
|
It was tuned following a similar procedure as described |
|
by [vitouphy](https://huggingface.co/vitouphy/wav2vec2-xls-r-300m-timit-phoneme) |
|
with the TIMIT dataset. |
|
|
|
# Usage |
|
To use the model, create a pipeline and invoke it with |
|
the path to your WAV, which must be sampled at 16KHz. |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline(model="mrrubino/wav2vec2-large-xlsr-53-l2-arctic-phoneme") |
|
transcription = pipe("file.wav")["text"] |
|
``` |
|
|
|
# Results |
|
The manually annotated subset of L2-Arctic was divided |
|
into training and testing datasets with a 90/10 split. |
|
The performance metrics for the testing dataset are |
|
included below. |
|
|
|
|
|
WER - 0.425 |
|
|
|
CER - 0.128 |