ai-forever
/

mGPT-1.3B-bulgarian

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ai-forever commited on Aug 11, 2023

Commit

43b477e

•

1 Parent(s): 4e6141c

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +34 -7

README.md CHANGED Viewed

@@ -19,18 +19,45 @@ Bulgarian belongs to Indo-European language family. It's a very phonetic languag
 2. It is the official language of Bulgaria.
 3. It was the first Slavic language attested in writing.
-## Dataset
-TBD
-## Technical details
-TBD
-## Examples of usage
-Try different generation strategies to reach better results.
-TBD
 Model will be improved over time. Stay tuned!

 2. It is the official language of Bulgaria.
 3. It was the first Slavic language attested in writing.
+## Technical details
+It's one of the models derived from the base [mGPT-XL (1.3B)](https://huggingface.co/ai-forever/mGPT) model (see the list below) which was originally trained on the 61 languages from 25 language families using Wikipedia and C4 corpus.
+We've found additional data for 23 languages most of which are considered as minor and decided to further tune the base model. **Bulgarian mGPT 1.3B** was trained for another 200 steps with batch_size=4 and context window of **2048** tokens on 1 A100.
+Final perplexity for this model on validation is **15.2**.
+_Chart of the training loss and perplexity:_
+![](https://i.imgur.com/25IBYNG.png)
+## Other mGPT-1.3B models
+- [mGPT-1.3B-armenian](https://huggingface.co/ai-forever/mGPT-1.3B-armenian)
+- [mGPT-1.3B-azerbaijan](https://huggingface.co/ai-forever/mGPT-1.3B-azerbaijan)
+- [mGPT-1.3B-bashkir](https://huggingface.co/ai-forever/mGPT-1.3B-bashkir)
+- [mGPT-1.3B-belorussian](https://huggingface.co/ai-forever/mGPT-1.3B-belorussian)
+- [mGPT-1.3B-buryat](https://huggingface.co/ai-forever/mGPT-1.3B-buryat)
+- [mGPT-1.3B-chuvash](https://huggingface.co/ai-forever/mGPT-1.3B-chuvash)
+- [mGPT-1.3B-georgian](https://huggingface.co/ai-forever/mGPT-1.3B-georgian)
+- [mGPT-1.3B-kalmyk](https://huggingface.co/ai-forever/mGPT-1.3B-kalmyk)
+- [mGPT-1.3B-kazakh](https://huggingface.co/ai-forever/mGPT-1.3B-kazakh)
+- [mGPT-1.3B-kirgiz](https://huggingface.co/ai-forever/mGPT-1.3B-kirgiz)
+- [mGPT-1.3B-mari](https://huggingface.co/ai-forever/mGPT-1.3B-mari)
+- [mGPT-1.3B-mongol](https://huggingface.co/ai-forever/mGPT-1.3B-mongol)
+- [mGPT-1.3B-ossetian](https://huggingface.co/ai-forever/mGPT-1.3B-ossetian)
+- [mGPT-1.3B-persian](https://huggingface.co/ai-forever/mGPT-1.3B-persian)
+- [mGPT-1.3B-romanian](https://huggingface.co/ai-forever/mGPT-1.3B-romanian)
+- [mGPT-1.3B-tajik](https://huggingface.co/ai-forever/mGPT-1.3B-tajik)
+- [mGPT-1.3B-tatar](https://huggingface.co/ai-forever/mGPT-1.3B-tatar)
+- [mGPT-1.3B-turkmen](https://huggingface.co/ai-forever/mGPT-1.3B-turkmen)
+- [mGPT-1.3B-tuvan](https://huggingface.co/ai-forever/mGPT-1.3B-tuvan)
+- [mGPT-1.3B-ukranian](https://huggingface.co/ai-forever/mGPT-1.3B-ukranian)
+- [mGPT-1.3B-uzbek](https://huggingface.co/ai-forever/mGPT-1.3B-uzbek)
+- [mGPT-1.3B-yakut](https://huggingface.co/ai-forever/mGPT-1.3B-yakut)
+## Feedback
+If you'll found a bug of have additional data to train model on your language — please, give us feedback.
 Model will be improved over time. Stay tuned!