File size: 1,389 Bytes
8cc9c70 d5ea6a1 8cc9c70 d5ea6a1 8cc9c70 d5ea6a1 8cc9c70 d5ea6a1 8cc9c70 d5ea6a1 8cc9c70 d5ea6a1 8cc9c70 d5ea6a1 8cc9c70 d5ea6a1 8cc9c70 e7148b2 8cc9c70 d5ea6a1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
---
library_name: transformers
license: apache-2.0
datasets:
- Vikhrmodels/Flan_translated_300k
- d0rj/OpenOrca-ru
language:
- ru
- en
---
# Model Card for ru-rope-t5-small-instruct
The Russian Rotary Position Embedding T5 model of small version after instruct tuning
## Model Details
The model was trained in a Russian corpus with a mix of English using the [Mixture-Of-Denoisers](https://arxiv.org/abs/2205.05131v1) pre-training method by [UL2](https://huggingface.co/google/ul2) on 1024 length sequences.
Training using Flash Attention 2 is available because of the replacement of bias with rotary encoding.
- **Model type:** [RoPE T5](https://huggingface.co/melmoth/ru-rope-t5-small-instruct/blob/main/t5.py)
- **Language(s) (NLP):** Russian, English
## Uses
Finetuning for downstream tasks
## Bias, Risks, and Limitations
Despite the instructional tuning, it is not recommended to use in zero-shot mode due to the small size
## Training Details
### Training Data
A corpus of Russian texts from [Vikhr](https://huggingface.co/Vikhrmodels) filtered by [FRED-T5-1.7B](https://huggingface.co/ai-forever/FRED-T5-1.7B) perplexy. Instructions are translated English set
### Training Procedure
Using AdamWScale instead of Adafactor for stable learning without loss explosions
#### Metrics

## Model Card Contact
[@TheMelmoth](https://t.me/TheMelmoth) |