diff --git "a/log.txt" "b/log.txt" new file mode 100644--- /dev/null +++ "b/log.txt" @@ -0,0 +1,3552 @@ +The following values were not passed to `accelerate launch` and had defaults used instead: + `--num_processes` was set to a value of `4` + More than one GPU was found, enabling multi-GPU training. + If this was unintended please pass in `--num_processes=1`. + `--num_machines` was set to a value of `1` + `--mixed_precision` was set to a value of `'no'` + `--dynamo_backend` was set to a value of `'no'` +To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. +Params using prompt template alpaca: +base_model: baichuan-inc/Baichuan2-7B-Base +data_path: ../../data/belle_dolphine/p13.jsonl +output_dir: ../out/lora/p13 +batch_size: 32 +micro_batch_size: 2 +num_epochs: 1 +learning_rate: 0.0004 +cutoff_len: 4096 +val_set_size: 0 +lr_scheduler: cosine +warmup_steps: 100 +lora_r: 16 +lora_alpha: 16 +lora_dropout: 0.05 +lora_target_modules: ['gate_proj', 'down_proj', 'up_proj'] +train_on_inputs: False +add_eos_token: False +group_by_length: False +wandb_project: lora-moe +wandb_run_name: belle_dolphine-p13 +wandb_watch: +wandb_log_model: +resume_from_checkpoint: False + +gradient_accumulation_steps: 4 +gradient_accumulation_steps: 4 +gradient_accumulation_steps: 4 +gradient_accumulation_steps: 4 + Loading checkpoint shards: 0%| | 0/2 [00:00 It should be 1 2 None +pre-trained model's BOS EOS and PAD token id: 1 2 0 => It should be 1 2 None + Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 8.45s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 9.02s/it] +pre-trained model's BOS EOS and PAD token id: 1 2 0 => It should be 1 2 None +pre-trained model's BOS EOS and PAD token id: 1 2 0 => It should be 1 2 None +trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183 + + Map: 0%| | 0/110379 [00:00