|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
datasets: |
|
- Vikhrmodels/Flan_translated_300k |
|
- d0rj/OpenOrca-ru |
|
language: |
|
- ru |
|
- en |
|
--- |
|
|
|
# Model Card for ru-rope-t5-small-instruct |
|
|
|
The Russian Rotary Position Embedding T5 model of small version after instruct tuning |
|
|
|
## Model Details |
|
|
|
The model was trained in a Russian corpus with a mix of English using the [Mixture-Of-Denoisers](https://arxiv.org/abs/2205.05131v1) pre-training method by [UL2](https://huggingface.co/google/ul2) on 1024 length sequences. |
|
Training using Flash Attention 2 is available because of the replacement of bias with rotary encoding. |
|
- **Model type:** [RoPE T5](https://huggingface.co/melmoth/ru-rope-t5-small-instruct/blob/main/t5.py) |
|
- **Language(s) (NLP):** Russian, English |
|
|
|
## Uses |
|
|
|
Finetuning for downstream tasks |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
Despite the instructional tuning, it is not recommended to use in zero-shot mode due to the small size |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
A corpus of Russian texts from [Vikhr](https://huggingface.co/Vikhrmodels) filtered by [FRED-T5-1.7B](https://huggingface.co/ai-forever/FRED-T5-1.7B) perplexy. Instructions are translated English set |
|
|
|
### Training Procedure |
|
|
|
Using AdamWScale instead of Adafactor for stable learning without loss explosions |
|
|
|
#### Metrics |
|
|
|
 |
|
|
|
## Model Card Contact |
|
|
|
[@TheMelmoth](https://t.me/TheMelmoth) |