melmoth
/

ru-rope-t5-small-instruct

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ru-rope-t5-small-instruct / README.md

melmoth's picture

Update README.md

e7148b2 verified 10 months ago

|

history blame contribute delete

1.39 kB

	---
	library_name: transformers
	license: apache-2.0
	datasets:
	- Vikhrmodels/Flan_translated_300k
	- d0rj/OpenOrca-ru
	language:
	- ru
	- en
	---

	# Model Card for ru-rope-t5-small-instruct

	The Russian Rotary Position Embedding T5 model of small version after instruct tuning

	## Model Details

	The model was trained in a Russian corpus with a mix of English using the [Mixture-Of-Denoisers](https://arxiv.org/abs/2205.05131v1) pre-training method by [UL2](https://huggingface.co/google/ul2) on 1024 length sequences.
	Training using Flash Attention 2 is available because of the replacement of bias with rotary encoding.
	- Model type: [RoPE T5](https://huggingface.co/melmoth/ru-rope-t5-small-instruct/blob/main/t5.py)
	- Language(s) (NLP): Russian, English

	## Uses

	Finetuning for downstream tasks

	## Bias, Risks, and Limitations

	Despite the instructional tuning, it is not recommended to use in zero-shot mode due to the small size

	## Training Details

	### Training Data

	A corpus of Russian texts from [Vikhr](https://huggingface.co/Vikhrmodels) filtered by [FRED-T5-1.7B](https://huggingface.co/ai-forever/FRED-T5-1.7B) perplexy. Instructions are translated English set

	### Training Procedure

	Using AdamWScale instead of Adafactor for stable learning without loss explosions

	#### Metrics

	![rsg](rsg_results.png)

	## Model Card Contact

	[@TheMelmoth](https://t.me/TheMelmoth)