jondurbin commited on
Commit
6d5b10d
·
1 Parent(s): 1cd6bd0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -12,6 +12,8 @@ Differences in the qlora scripts:
12
 
13
  __I think there's a bug in gradient accumulation, so if you try this, maybe set gradient accumulation steps to 1__
14
 
 
 
15
  Full example of tuning (used for airoboros-mpt-30b-gpt4-1.4):
16
 
17
  ```
@@ -23,11 +25,11 @@ export WANDB_PROJECT=airoboros-mpt-30b-gpt4-1.4
23
  python qlora.py \
24
  --model_name_or_path ./mpt-30b \
25
  --output_dir ./$WANDB_PROJECT-checkpoints \
26
- --num_train_epochs 3 \
27
  --logging_steps 1 \
28
  --save_strategy steps \
29
  --data_seed 11422 \
30
- --save_steps 75 \
31
  --save_total_limit 3 \
32
  --evaluation_strategy "no" \
33
  --eval_dataset_size 2 \
 
12
 
13
  __I think there's a bug in gradient accumulation, so if you try this, maybe set gradient accumulation steps to 1__
14
 
15
+ __5 epochs seemed to achieve the best results, but YMMV__
16
+
17
  Full example of tuning (used for airoboros-mpt-30b-gpt4-1.4):
18
 
19
  ```
 
25
  python qlora.py \
26
  --model_name_or_path ./mpt-30b \
27
  --output_dir ./$WANDB_PROJECT-checkpoints \
28
+ --num_train_epochs 5 \
29
  --logging_steps 1 \
30
  --save_strategy steps \
31
  --data_seed 11422 \
32
+ --save_steps 100 \
33
  --save_total_limit 3 \
34
  --evaluation_strategy "no" \
35
  --eval_dataset_size 2 \