The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `4` More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in `--num_processes=1`. `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. gradient_accumulation_steps: 4 gradient_accumulation_steps: 4 gradient_accumulation_steps: 4 Params using prompt template alpaca: base_model: baichuan-inc/Baichuan2-7B-Base data_path: ../../data/belle_dolphine/p11.jsonl output_dir: ../out/lora/p11 batch_size: 32 micro_batch_size: 2 num_epochs: 1 learning_rate: 0.0004 cutoff_len: 4096 val_set_size: 0 lr_scheduler: cosine warmup_steps: 100 lora_r: 16 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['gate_proj', 'down_proj', 'up_proj'] train_on_inputs: False add_eos_token: False group_by_length: False wandb_project: lora-moe wandb_run_name: belle_dolphine-p11 wandb_watch: wandb_log_model: resume_from_checkpoint: False gradient_accumulation_steps: 4 Loading checkpoint shards: 0%| | 0/2 [00:00 It should be 1 2 None Loading checkpoint shards: 100%|██████████| 2/2 [00:20<00:00, 9.49s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:20<00:00, 10.36s/it] pre-trained model's BOS EOS and PAD token id: 1 2 0 => It should be 1 2 None pre-trained model's BOS EOS and PAD token id: 1 2 0 => It should be 1 2 None Loading checkpoint shards: 100%|██████████| 2/2 [00:21<00:00, 10.00s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:21<00:00, 10.84s/it] trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183 trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183 Map: 0%| | 0/67764 [00:00 It should be 1 2 None Map: 1%| | 422/67764 [00:00<01:03, 1056.59 examples/s] Map: 1%| | 440/67764 [00:00<01:01, 1097.70 examples/s] Map: 1%| | 535/67764 [00:00<01:02, 1078.32 examples/s] Map: 1%| | 556/67764 [00:00<01:00, 1118.47 examples/s]trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183 Map: 1%| | 647/67764 [00:00<01:01, 1089.71 examples/s] Map: 1%| | 675/67764 [00:00<00:58, 1137.23 examples/s] Map: 0%| | 0/67764 [00:00