rccmsu
/

ruadapt_mistral_7b_v0.1

Text Generation

text-generation-inference

Model card Files Files and versions Community

rccmsu commited on Jan 15, 2024

Commit

48a3b77

·

verified ·

1 Parent(s): 44ccf3f

Create README.md

Files changed (1) hide show

README.md +41 -0

README.md ADDED Viewed

	@@ -0,0 +1,41 @@

+---
+license: apache-2.0
+language:
+- ru
+pipeline_tag: text-generation
+---
+# ruadapt_mistral_7b_v0.1
+This model is a fine-tuned (embeddings, lm head) version of mistralai/Mistral-7B-v0.1 on the Russian dataset (33GB). The training lasted 0.8 epochs, after which an error occurred. Was slightly additionally trained using LoRa after that.
+ATTENTION!!!
+The metrics on various datasets are slightly worse than those of the original model.
+## Model description
+Russian adaptation of Mistral-7B by replacing the tokenizer.
+Paper: Tikhomirov M., Chernyshev D. Impact of Tokenization on LLaMa Russian Adaptation //arXiv preprint arXiv:2312.02598. – 2023.
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 6
+- eval_batch_size: 6
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 16
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 192
+- total_eval_batch_size: 96
+- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
+- lr_scheduler_type: linear
+- num_epochs: 2.0
+### Framework versions
+- Transformers 4.34.0
+- Pytorch 2.0.1+cu118
+- Datasets 2.14.5
+- Tokenizers 0.14.1