rccmsu commited on
Commit
48a3b77
·
verified ·
1 Parent(s): 44ccf3f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ru
5
+ pipeline_tag: text-generation
6
+ ---
7
+
8
+ # ruadapt_mistral_7b_v0.1
9
+
10
+ This model is a fine-tuned (embeddings, lm head) version of mistralai/Mistral-7B-v0.1 on the Russian dataset (33GB). The training lasted 0.8 epochs, after which an error occurred. Was slightly additionally trained using LoRa after that.
11
+
12
+ ATTENTION!!!
13
+ The metrics on various datasets are slightly worse than those of the original model.
14
+
15
+ ## Model description
16
+
17
+ Russian adaptation of Mistral-7B by replacing the tokenizer.
18
+ Paper: Tikhomirov M., Chernyshev D. Impact of Tokenization on LLaMa Russian Adaptation //arXiv preprint arXiv:2312.02598. – 2023.
19
+
20
+ ### Training hyperparameters
21
+
22
+ The following hyperparameters were used during training:
23
+ - learning_rate: 2e-05
24
+ - train_batch_size: 6
25
+ - eval_batch_size: 6
26
+ - seed: 42
27
+ - distributed_type: multi-GPU
28
+ - num_devices: 16
29
+ - gradient_accumulation_steps: 2
30
+ - total_train_batch_size: 192
31
+ - total_eval_batch_size: 96
32
+ - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
33
+ - lr_scheduler_type: linear
34
+ - num_epochs: 2.0
35
+
36
+ ### Framework versions
37
+
38
+ - Transformers 4.34.0
39
+ - Pytorch 2.0.1+cu118
40
+ - Datasets 2.14.5
41
+ - Tokenizers 0.14.1