Error while fine-tuning
I am finetuning this model on my own datasets. This is information about version of framework that i used:
- python: 3.10
- torch: 2.0.1
- cudnn: 11.7
- GPU: 3090
- Additional library: Like in the instruction on HF
But I got the error and i cant fix it. Please help me:
/workspace/Vintern/internvl_chat
- GPUS=1
- BATCH_SIZE=1
- PER_DEVICE_BATCH_SIZE=1
- GRADIENT_ACC=1
- pwd
- export PYTHONPATH=:/workspace/Vintern/internvl_chat
- export MASTER_PORT=34229
- export TF_CPP_MIN_LOG_LEVEL=3
- export LAUNCHER=pytorch
- OUTPUT_DIR=work_dirs/internvl_chat_v2_0/Vintern_1B_v2_finetune_lora_viet_medical_vqa
- [ ! -d work_dirs/internvl_chat_v2_0/Vintern_1B_v2_finetune_lora_viet_medical_vqa ]
- torchrun --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --nproc_per_node=1 --master_port=34229 internvl/train/internvl_chat_finetune.py --model_name_or_path ../pretrained/Vintern-1B-v2 --conv_style Hermes-2 --output_dir work_dirs/internvl_chat_v2_0/Vintern_1B_v2_finetune_lora_viet_medical_vqa --meta_path+ shell/data/custom_fintune_datasets.json --overwrite_output_dir True --force_image_sizetee 448 -a --max_dynamic_patch work_dirs/internvl_chat_v2_0/Vintern_1B_v2_finetune_lora_viet_medical_vqa/training_log.txt 6
--down_sample_ratio 0.5 --drop_path_rate 0.0 --freeze_llm True --freeze_mlp True --freeze_backbone True --use_llm_lora 16 --vision_select_layer -1 --dataloader_num_workers 4 --bf16 True --num_train_epochs 1 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --evaluation_strategy no --save_strategy steps --save_steps 500 --save_total_limit 2 --learning_rate 4e-5 --weight_decay 0.01 --warmup_ratio 0.03 --lr_scheduler_type cosine --logging_steps 10 --max_seq_length 700 --do_train True --grad_checkpoint True --group_by_length True --dynamic_image_size True --use_thumbnail True --ps_version v2 --deepspeed zero_stage1_config.json --report_to tensorboard
[2024-11-12 08:00:32,227] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/opt/conda/lib/python3.10/site-packages/torchvision/datapoints/init.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/opt/conda/lib/python3.10/site-packages/torchvision/transforms/v2/init.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/opt/conda/lib/python3.10/site-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
petrel_client is not installed. If you read data locally instead of from ceph, ignore it.
Replace train sampler!!
petrel_client is not installed. Using PIL to load images.
[2024-11-12 08:00:36,992] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-11-12 08:00:36,992] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
11/12/2024 08:00:37 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
11/12/2024 08:00:37 - INFO - main - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
average_tokens_across_devices=False,
batch_eval_metrics=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=4,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=zero_stage1_config.json,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_steps=None,
eval_strategy=no,
eval_use_gather_object=False,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=True,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=,
ignore_data_skip=False,
include_for_metrics=[],
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=4e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=work_dirs/internvl_chat_v2_0/Vintern_1B_v2_finetune_lora_viet_medical_vqa/runs/Nov12_08-00-37_c820fe065355,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_kwargs={},
lr_scheduler_type=cosine,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_torch,
optim_args=None,
optim_target_modules=None,
output_dir=work_dirs/internvl_chat_v2_0/Vintern_1B_v2_finetune_lora_viet_medical_vqa,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=1,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
restore_callback_states_from_checkpoint=False,
resume_from_checkpoint=None,
run_name=work_dirs/internvl_chat_v2_0/Vintern_1B_v2_finetune_lora_viet_medical_vqa,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=500,
save_strategy=steps,
save_total_limit=2,
seed=42,
skip_memory_metrics=True,
split_batches=None,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torch_empty_cache_steps=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_liger_kernel=False,
use_mps_device=False,
warmup_ratio=0.03,
warmup_steps=0,
weight_decay=0.01,
)
11/12/2024 08:00:37 - INFO - main - Loading Tokenizer: ../pretrained/Vintern-1B-v2
[INFO|tokenization_utils_base.py:2209] 2024-11-12 08:00:37,038 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2209] 2024-11-12 08:00:37,039 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2209] 2024-11-12 08:00:37,039 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2209] 2024-11-12 08:00:37,039 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2209] 2024-11-12 08:00:37,039 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2209] 2024-11-12 08:00:37,039 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2475] 2024-11-12 08:00:37,314 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
11/12/2024 08:00:37 - INFO - main - Loading InternVLChatModel...
[INFO|configuration_utils.py:677] 2024-11-12 08:00:37,322 >> loading configuration file ../pretrained/Vintern-1B-v2/config.json
[INFO|configuration_utils.py:746] 2024-11-12 08:00:37,324 >> Model config InternVLChatConfig {
"_commit_hash": null,
"_name_or_path": "khang119966/vintern-final",
"architectures": [
"InternVLChatModel"
],
"auto_map": {
"AutoConfig": "5CD-AI/Vintern-1B-v2--configuration_internvl_chat.InternVLChatConfig",
"AutoModel": "5CD-AI/Vintern-1B-v2--modeling_internvl_chat.InternVLChatModel",
"AutoModelForCausalLM": "5CD-AI/Vintern-1B-v2--modeling_internvl_chat.InternVLChatModel"
},
"downsample_ratio": 0.5,
"dynamic_image_size": true,
"force_image_size": 448,
"llm_config": {
"_attn_implementation_autoset": false,
"_name_or_path": "Qwen/Qwen2-0.5B-Instruct",
"add_cross_attention": false,
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": 151643,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": 151645,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"hidden_act": "silu",
"hidden_size": 896,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_range": 0.02,
"intermediate_size": 4864,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 32768,
"max_window_layers": 24,
"min_length": 0,
"model_type": "qwen2",
"no_repeat_ngram_size": 0,
"num_attention_heads": 14,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 24,
"num_key_value_heads": 2,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sep_token_id": null,
"sliding_window": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torch_dtype": "bfloat16",
"torchscript": false,
"transformers_version": "4.46.2",
"typical_p": 1.0,
"use_bfloat16": true,
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151655
},
"max_dynamic_patch": 12,
"min_dynamic_patch": 1,
"model_type": "internvl_chat",
"pad2square": false,
"ps_version": "v2",
"select_layer": -1,
"template": "Hermes-2",
"torch_dtype": "bfloat16",
"transformers_version": null,
"use_backbone_lora": 0,
"use_llm_lora": 0,
"use_thumbnail": true,
"vision_config": {
"_attn_implementation_autoset": false,
"_name_or_path": "",
"add_cross_attention": false,
"architectures": [
"InternVisionModel"
],
"attention_dropout": 0.0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"drop_path_rate": 0.0,
"dropout": 0.0,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"hidden_act": "gelu",
"hidden_size": 1024,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"image_size": 448,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 4096,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-06,
"length_penalty": 1.0,
"max_length": 20,
"min_length": 0,
"model_type": "intern_vit_6b",
"no_repeat_ngram_size": 0,
"norm_type": "layer_norm",
"num_attention_heads": 16,
"num_beam_groups": 1,
"num_beams": 1,
"num_channels": 3,
"num_hidden_layers": 24,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"patch_size": 14,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"qk_normalization": false,
"qkv_bias": true,
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torch_dtype": "bfloat16",
"torchscript": false,
"transformers_version": "4.46.2",
"typical_p": 1.0,
"use_bfloat16": true,
"use_flash_attn": false
}
}
11/12/2024 08:00:37 - INFO - main - Using flash_attention_2 for LLaMA
[INFO|modeling_utils.py:3934] 2024-11-12 08:00:37,325 >> loading weights file ../pretrained/Vintern-1B-v2/model.safetensors
[INFO|modeling_utils.py:1670] 2024-11-12 08:00:37,346 >> Instantiating InternVLChatModel model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1096] 2024-11-12 08:00:37,348 >> Generate config GenerationConfig {}
[INFO|configuration_utils.py:1096] 2024-11-12 08:00:37,395 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645
}
[INFO|modeling_utils.py:4800] 2024-11-12 08:00:38,361 >> All model checkpoint weights were used when initializing InternVLChatModel.
[INFO|modeling_utils.py:4808] 2024-11-12 08:00:38,361 >> All the weights of InternVLChatModel were initialized from the model checkpoint at ../pretrained/Vintern-1B-v2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use InternVLChatModel for predictions without further training.
[INFO|configuration_utils.py:1049] 2024-11-12 08:00:38,369 >> loading configuration file ../pretrained/Vintern-1B-v2/generation_config.json
[INFO|configuration_utils.py:1096] 2024-11-12 08:00:38,370 >> Generate config GenerationConfig {}