--- license: apache-2.0 language: - yue library_name: transformers --- # Cantonese Wav2Vec2-Conformer-Base with Relative Position Embeddings wav2vec 2.0 Conformer with relative position embeddings, pretrained on 2.8K hours of Cantonese spontaneous speech data sampled at 16kHz. Note: This model has not been fine-tuned on labeled text data. ## Alternative Version An alternative version of the model which was pre-trained on the same dataset but sets `layer_norm_first` to `false` is available [here](https://drive.google.com/file/d/1rbP-6pZfR5ieqAwd5_X2KzipLuKpXSsQ/view?usp=sharing) as a fairseq checkpoint and may give better downstream results. ## Citation Please cite the following paper if you use the model. ``` @inproceedings{rcfhuang23_interspeech, author = {Ranzo C. F. Huang and Brian Mak}, year = {2023}, title = {{wav2vec 2.0 ASR} for {Cantonese}-Speaking Older Adults in a Clinical Setting}, booktitle = {Interspeech 2023}, pubstate = {forthcoming}, } ```