diff --git "a/stderr.log" "b/stderr.log" new file mode 100644--- /dev/null +++ "b/stderr.log" @@ -0,0 +1,90 @@ ++ deepspeed --num_nodes=1 --num_gpus=8 --master_port 35109 --module safe_rlhf.finetune --train_datasets bt --model_name_or_path cerebras/btlm-3b-8k-base --max_length 8092 --trust_remote_code True --epochs 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 2 --gradient_accumulation_steps 1 --gradient_checkpointing --learning_rate 4.7e-6 --lr_scheduler_type cosine --num_warmup_steps 20 --weight_decay 0.0 --seed 42 --output_dir /home/paperspace/safe-rlhf/output/sft --log_type wandb --log_project BT-Training --zero_stage 2 --bf16 True --tf32 True +Using pad_token, but it is not set yet. +Using pad_token, but it is not set yet. +Using pad_token, but it is not set yet. +Using pad_token, but it is not set yet. +Using pad_token, but it is not set yet. +Using pad_token, but it is not set yet. +Using pad_token, but it is not set yet. +Using pad_token, but it is not set yet. +WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 +WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 +WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 +WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) +WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) +WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 +WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) +WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 +WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 +WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) +WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) +WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) +WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 +WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) +WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 +WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) +Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... +Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... +Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... +Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... +Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... +Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... +Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... +Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... +Detected CUDA files, patching ldflags +Emitting ninja build file /home/paperspace/.cache/torch_extensions/py39_cu117/fused_adam/build.ninja... +Building extension module fused_adam... +Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +Loading extension module fused_adam... +Loading extension module fused_adam... +Loading extension module fused_adam... +Loading extension module fused_adam... +Loading extension module fused_adam... +Loading extension module fused_adam... +Loading extension module fused_adam... +Loading extension module fused_adam... +WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +wandb: Tracking run with wandb version 0.13.4 +wandb: W&B syncing is set to `offline` in this directory. +wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing. + Training 1/16 epoch: 0%| | 0/880 [00:00