kehuitt commited on
Commit
537d288
·
verified ·
1 Parent(s): c9739cc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This model is a fine-tuned version of Mixtral-8x22B-Instruct-v0.1 on the mbpp dataset.
2
+
3
+ # Model description
4
+ More information needed
5
+
6
+ # Intended uses & limitations
7
+ More information needed
8
+
9
+ # Training and evaluation data
10
+ More information needed
11
+
12
+ # Training hyperparameters
13
+ The following hyperparameters were used during training:
14
+
15
+ ### method
16
+ - stage: sft
17
+ - finetuning_type: lora
18
+ - lora_target: all
19
+ - deepspeed: examples/deepspeed/ds_z3_offload_config.json
20
+
21
+ ### dataset
22
+ - dataset: mbpp
23
+ - template: mistral
24
+ - cutoff_len: 2048
25
+ - max_samples: 316
26
+ - overwrite_cache: true
27
+ - preprocessing_num_workers: 16
28
+
29
+ ### train
30
+ - per_device_train_batch_size: 1
31
+ - gradient_accumulation_steps: 2
32
+ - learning_rate: 1.0e-4
33
+ - num_train_epochs: 3
34
+ - lr_scheduler_type: cosine
35
+ - warmup_ratio: 0.1
36
+ - bf16: true
37
+ - ddp_timeout: 180000000
38
+
39
+ # Framework versions
40
+ - PEFT 0.14.0
41
+ - Transformers 4.47.0
42
+ - Pytorch 2.5.1+cu124
43
+ - Datasets 2.14.6
44
+ - Tokenizers 0.21.0