pkupie
/

mc2-xlmr-large

Inference Endpoints

Model card Files Files and versions Community

mc2-xlmr-large / README.md

luciusssss's picture

Update README.md

7f35fb9 verified 7 months ago

|

history blame contribute delete

873 Bytes

	---
	license: mit
	datasets:
	- pkupie/mc2_corpus
	language:
	- bo
	- ug
	- mn
	- kk
	---
	# MC^2XLMR-large
	[Github Repo](https://github.com/luciusssss/mc2_corpus)


	We continually pretrain XLM-RoBERTa-large with [MC^2](https://huggingface.co/datasets/pkupie/mc2_corpus), which supports Tibetan, Uyghur, Kazakh in the Kazakh Arabic script, and Mongolian in the traditional Mongolian script.


	See details in the [paper](https://arxiv.org/abs/2311.08348).

	We have also released another model trained on MC^2: [MC^2Llama-13B](https://huggingface.co/pkupie/mc2-llama-13b).

	## Citation
	```
	@article{zhang2024mc,
	title={MC$^2$: Towards Transparent and Culturally-Aware NLP for Minority Languages in China},
	author={Zhang, Chen and Tao, Mingxu and Huang, Quzhe and Lin, Jiuheng and Chen, Zhibin and Feng, Yansong},
	journal={arXiv preprint arXiv:2311.08348},
	year={2024}
	}
	```