diff --git "a/log.txt" "b/log.txt" new file mode 100644--- /dev/null +++ "b/log.txt" @@ -0,0 +1,6899 @@ +The following values were not passed to `accelerate launch` and had defaults used instead: + `--num_processes` was set to a value of `4` + More than one GPU was found, enabling multi-GPU training. + If this was unintended please pass in `--num_processes=1`. + `--num_machines` was set to a value of `1` + `--mixed_precision` was set to a value of `'no'` + `--dynamo_backend` was set to a value of `'no'` +To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. +gradient_accumulation_steps: 4 +gradient_accumulation_steps: 4 +Params using prompt template alpaca: +base_model: baichuan-inc/Baichuan2-7B-Base +data_path: ../../data/belle_dolphine/p14.jsonl +output_dir: ../out/lora/p14 +batch_size: 32 +micro_batch_size: 2 +num_epochs: 1 +learning_rate: 0.0004 +cutoff_len: 4096 +val_set_size: 0 +lr_scheduler: cosine +warmup_steps: 100 +lora_r: 16 +lora_alpha: 16 +lora_dropout: 0.05 +lora_target_modules: ['gate_proj', 'down_proj', 'up_proj'] +train_on_inputs: False +add_eos_token: False +group_by_length: False +wandb_project: lora-moe +wandb_run_name: belle_dolphine-p14 +wandb_watch: +wandb_log_model: +resume_from_checkpoint: False + +gradient_accumulation_steps: 4 +gradient_accumulation_steps: 4 + Loading checkpoint shards: 0%| | 0/2 [00:00 It should be 1 2 None +pre-trained model's BOS EOS and PAD token id: 1 2 0 => It should be 1 2 None + Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 8.57s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 9.16s/it] + Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 8.65s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 9.23s/it] +trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183 + Map: 0%| | 0/217273 [00:00 It should be 1 2 None + Map: 0%| | 477/217273 [00:00<04:06, 877.81 examples/s] Map: 0%| | 279/217273 [00:00<03:53, 929.86 examples/s] Map: 0%| | 568/217273 [00:00<04:04, 887.50 examples/s] Map: 0%| | 375/217273 [00:00<03:50, 940.93 examples/s]pre-trained model's BOS EOS and PAD token id: 1 2 0 => It should be 1 2 None + Map: 0%| | 661/217273 [00:00<04:00, 899.15 examples/s] Map: 0%| | 476/217273 [00:00<03:46, 956.82 examples/s] Map: 0%| | 753/217273 [00:00<03:59, 903.09 examples/s] Map: 0%| | 611/217273 [00:00<03:53, 926.64 examples/s] Map: 0%| | 845/217273 [00:00<03:59, 904.79 examples/s] Map: 0%| | 707/217273 [00:00<03:51, 934.70 examples/s] Map: 0%| | 936/217273 [00:01<03:58, 905.97 examples/s] Map: 0%| | 802/217273 [00:00<03:51, 933.82 examples/s] Map: 0%| | 897/217273 [00:00<03:51, 936.60 examples/s] Map: 0%| | 1052/217273 [00:01<04:40, 769.66 examples/s] Map: 1%| | 1150/217273 [00:01<04:23, 820.10 examples/s] Map: 0%| | 1000/217273 [00:01<04:47, 751.11 examples/s] Map: 1%| | 1282/217273 [00:01<04:17, 838.23 examples/s] Map: 1%| | 1097/217273 [00:01<04:28, 803.73 examples/s] Map: 1%| | 1378/217273 [00:01<04:09, 865.36 examples/s] Map: 1%| | 1197/217273 [00:01<04:13, 851.56 examples/s] Map: 1%| | 1471/217273 [00:01<04:05, 878.26 examples/s] Map: 1%| | 1294/217273 [00:01<04:05, 879.07 examples/s] Map: 1%| | 1387/217273 [00:01<04:03, 885.78 examples/s] Map: 1%| | 1603/217273 [00:01<04:06, 876.40 examples/s]trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183 + Map: 1%| | 1481/217273 [00:01<04:00, 898.80 examples/s] Map: 1%| | 1702/217273 [00:01<03:58, 904.13 examples/s] Map: 1%| | 1575/217273 [00:01<03:57, 906.98 examples/s]trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183 + Map: 0%| | 0/217273 [00:00