|
--- |
|
language: |
|
- uk |
|
- en |
|
- ru |
|
license: mit |
|
tags: |
|
- gpt3 |
|
- transformers |
|
- mgpt |
|
--- |
|
# ๐บ๐ฆ Ukranian mGPT 1.3B |
|
|
|
Language model for Ukranian. Model has 1.3B parameters as you can guess from it's name. |
|
|
|
Ukranian belongs to Indo-European language family. It's a very melodic language with approximately 40 million speakers. Here are some facts about it: |
|
|
|
1. One of the East Slavic languages, alongside Russian and Belarusian. |
|
2. It is the official language of Ukraine and is written in a version of the Cyrillic script. |
|
3. Ukrainian has a rich literary history, it has maintained a vibrant cultural presence, especially in poetry and music. |
|
|
|
## Technical details |
|
|
|
It's one of the models derived from the base [mGPT-XL (1.3B)](https://huggingface.co/ai-forever/mGPT) model (see the list below) which was originally trained on the 61 languages from 25 language families using Wikipedia and C4 corpus. |
|
|
|
We've found additional data for 23 languages most of which are considered as minor and decided to further tune the base model. **Ukranian mGPT 1.3B** was trained for another 10000 steps with batch_size=4 and context window of **2048** tokens on 1 A100. |
|
|
|
Final perplexity for this model on validation is **7.1**. |
|
|
|
_Chart of the training loss and perplexity:_ |
|
|
|
![](https://i.imgur.com/DppAo6e.png) |
|
|
|
## Other mGPT-1.3B models |
|
|
|
- [๐ฆ๐ฒ mGPT-1.3B Armenian](https://huggingface.co/ai-forever/mGPT-1.3B-armenian) |
|
- [๐ฆ๐ฟ mGPT-1.3B Azerbaijan](https://huggingface.co/ai-forever/mGPT-1.3B-azerbaijan) |
|
- [๐ฏ mGPT-1.3B Bashkir](https://huggingface.co/ai-forever/mGPT-1.3B-bashkir) |
|
- [๐ง๐พ mGPT-1.3B Belorussian](https://huggingface.co/ai-forever/mGPT-1.3B-belorussian) |
|
- [๐ง๐ฌ mGPT-1.3B Bulgarian](https://huggingface.co/ai-forever/mGPT-1.3B-bulgarian) |
|
- [๐ mGPT-1.3B Buryat](https://huggingface.co/ai-forever/mGPT-1.3B-buryat) |
|
- [๐ณ mGPT-1.3B Chuvash](https://huggingface.co/ai-forever/mGPT-1.3B-chuvash) |
|
- [๐ฌ๐ช mGPT-1.3B Georgian](https://huggingface.co/ai-forever/mGPT-1.3B-georgian) |
|
- [๐ธ mGPT-1.3B Kalmyk](https://huggingface.co/ai-forever/mGPT-1.3B-kalmyk) |
|
- [๐ฐ๐ฟ mGPT-1.3B Kazakh](https://huggingface.co/ai-forever/mGPT-1.3B-kazakh) |
|
- [๐ฐ๐ฌ mGPT-1.3B Kirgiz](https://huggingface.co/ai-forever/mGPT-1.3B-kirgiz) |
|
- [๐ป mGPT-1.3B Mari](https://huggingface.co/ai-forever/mGPT-1.3B-mari) |
|
- [๐ฒ๐ณ mGPT-1.3B Mongol](https://huggingface.co/ai-forever/mGPT-1.3B-mongol) |
|
- [๐ mGPT-1.3B Ossetian](https://huggingface.co/ai-forever/mGPT-1.3B-ossetian) |
|
- [๐ฎ๐ท mGPT-1.3B Persian](https://huggingface.co/ai-forever/mGPT-1.3B-persian) |
|
- [๐ท๐ด mGPT-1.3B Romanian](https://huggingface.co/ai-forever/mGPT-1.3B-romanian) |
|
- [๐น๐ฏ mGPT-1.3B Tajik](https://huggingface.co/ai-forever/mGPT-1.3B-tajik) |
|
- [โ mGPT-1.3B Tatar](https://huggingface.co/ai-forever/mGPT-1.3B-tatar) |
|
- [๐น๐ฒ mGPT-1.3B Turkmen](https://huggingface.co/ai-forever/mGPT-1.3B-turkmen) |
|
- [๐ mGPT-1.3B Tuvan](https://huggingface.co/ai-forever/mGPT-1.3B-tuvan) |
|
- [๐บ๐ฟ mGPT-1.3B Uzbek](https://huggingface.co/ai-forever/mGPT-1.3B-uzbek) |
|
- [๐ mGPT-1.3B Yakut](https://huggingface.co/ai-forever/mGPT-1.3B-yakut) |
|
|
|
## Feedback |
|
|
|
If you'll find a bug or have additional data to train a model on your language โ **please, give us feedback**. |
|
|
|
Model will be improved over time. Stay tuned! |
|
|