Cantonese Wav2Vec2-Conformer-Base with Relative Position Embeddings
wav2vec 2.0 Conformer with relative position embeddings, pretrained on 2.8K hours of Cantonese spontaneous speech data sampled at 16kHz.
Note: This model has not been fine-tuned on labeled text data.
Alternative Version
An alternative version of the model which was pre-trained on the same dataset but
with setting layer_norm_first
to false
is available here
as a fairseq checkpoint and may give better downstream results.
Citation
Please cite the following paper if you use the model.
@inproceedings{huang23h_interspeech,
author={Ranzo Huang and Brian Mak},
title={{wav2vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
pages={4958--4962},
doi={10.21437/Interspeech.2023-2470}
}
- Downloads last month
- 23