Solidity_Model_Q / running_log.txt

Upload LoRA adapter and tokenizer files

7a8fec5 verified 15 days ago

18.4 kB

	[WARNING\|2025-03-26 10:33:41] logging.py:162 >> `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

	[INFO\|2025-03-26 10:33:41] parser.py:355 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

	[INFO\|2025-03-26 10:33:42] parser.py:355 >> Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

	[INFO\|2025-03-26 10:33:42] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/config.json

	[INFO\|2025-03-26 10:33:42] configuration_utils.py:746 >> Model config Qwen2Config {
	"_name_or_path": "Qwen/Qwen2.5-Coder-7B",
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151643,
	"hidden_act": "silu",
	"hidden_size": 3584,
	"initializer_range": 0.02,
	"intermediate_size": 18944,
	"max_position_embeddings": 32768,
	"max_window_layers": 28,
	"model_type": "qwen2",
	"num_attention_heads": 28,
	"num_hidden_layers": 28,
	"num_key_value_heads": 4,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": false,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.46.1",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 152064
	}


	[INFO\|2025-03-26 10:33:43] tokenization_utils_base.py:2211 >> loading file vocab.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/vocab.json

	[INFO\|2025-03-26 10:33:43] tokenization_utils_base.py:2211 >> loading file merges.txt from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/merges.txt

	[INFO\|2025-03-26 10:33:43] tokenization_utils_base.py:2211 >> loading file tokenizer.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/tokenizer.json

	[INFO\|2025-03-26 10:33:43] tokenization_utils_base.py:2211 >> loading file added_tokens.json from cache at None

	[INFO\|2025-03-26 10:33:43] tokenization_utils_base.py:2211 >> loading file special_tokens_map.json from cache at None

	[INFO\|2025-03-26 10:33:43] tokenization_utils_base.py:2211 >> loading file tokenizer_config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/tokenizer_config.json

	[INFO\|2025-03-26 10:33:43] tokenization_utils_base.py:2475 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

	[INFO\|2025-03-26 10:33:44] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/config.json

	[INFO\|2025-03-26 10:33:44] configuration_utils.py:746 >> Model config Qwen2Config {
	"_name_or_path": "Qwen/Qwen2.5-Coder-7B",
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151643,
	"hidden_act": "silu",
	"hidden_size": 3584,
	"initializer_range": 0.02,
	"intermediate_size": 18944,
	"max_position_embeddings": 32768,
	"max_window_layers": 28,
	"model_type": "qwen2",
	"num_attention_heads": 28,
	"num_hidden_layers": 28,
	"num_key_value_heads": 4,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": false,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.46.1",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 152064
	}


	[INFO\|2025-03-26 10:33:44] tokenization_utils_base.py:2211 >> loading file vocab.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/vocab.json

	[INFO\|2025-03-26 10:33:44] tokenization_utils_base.py:2211 >> loading file merges.txt from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/merges.txt

	[INFO\|2025-03-26 10:33:44] tokenization_utils_base.py:2211 >> loading file tokenizer.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/tokenizer.json

	[INFO\|2025-03-26 10:33:44] tokenization_utils_base.py:2211 >> loading file added_tokens.json from cache at None

	[INFO\|2025-03-26 10:33:44] tokenization_utils_base.py:2211 >> loading file special_tokens_map.json from cache at None

	[INFO\|2025-03-26 10:33:44] tokenization_utils_base.py:2211 >> loading file tokenizer_config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/tokenizer_config.json

	[INFO\|2025-03-26 10:33:45] tokenization_utils_base.py:2475 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

	[INFO\|2025-03-26 10:33:45] logging.py:157 >> Loading dataset new-datasets/solidity_v3.json...

	[INFO\|2025-03-26 10:34:08] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/config.json

	[INFO\|2025-03-26 10:34:08] configuration_utils.py:746 >> Model config Qwen2Config {
	"_name_or_path": "Qwen/Qwen2.5-Coder-7B",
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151643,
	"hidden_act": "silu",
	"hidden_size": 3584,
	"initializer_range": 0.02,
	"intermediate_size": 18944,
	"max_position_embeddings": 32768,
	"max_window_layers": 28,
	"model_type": "qwen2",
	"num_attention_heads": 28,
	"num_hidden_layers": 28,
	"num_key_value_heads": 4,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": false,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.46.1",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 152064
	}


	[INFO\|2025-03-26 10:34:08] modeling_utils.py:3937 >> loading weights file model.safetensors from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/model.safetensors.index.json

	[INFO\|2025-03-26 10:34:08] modeling_utils.py:1670 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.

	[INFO\|2025-03-26 10:34:08] configuration_utils.py:1096 >> Generate config GenerationConfig {
	"bos_token_id": 151643,
	"eos_token_id": 151643
	}


	[INFO\|2025-03-26 10:34:14] modeling_utils.py:4800 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.


	[INFO\|2025-03-26 10:34:14] modeling_utils.py:4808 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2.5-Coder-7B.
	If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.

	[INFO\|2025-03-26 10:34:14] configuration_utils.py:1051 >> loading configuration file generation_config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/generation_config.json

	[INFO\|2025-03-26 10:34:14] configuration_utils.py:1096 >> Generate config GenerationConfig {
	"bos_token_id": 151643,
	"eos_token_id": 151643,
	"max_new_tokens": 2048
	}


	[INFO\|2025-03-26 10:34:14] logging.py:157 >> Gradient checkpointing enabled.

	[INFO\|2025-03-26 10:34:14] logging.py:157 >> Using torch SDPA for faster training and inference.

	[INFO\|2025-03-26 10:34:14] logging.py:157 >> Upcasting trainable params to float32.

	[INFO\|2025-03-26 10:34:14] logging.py:157 >> Fine-tuning method: LoRA

	[INFO\|2025-03-26 10:34:14] logging.py:157 >> Found linear modules: q_proj,k_proj,up_proj,down_proj,gate_proj,o_proj,v_proj

	[INFO\|2025-03-26 10:34:15] logging.py:157 >> trainable params: 40,370,176 \|\| all params: 7,655,986,688 \|\| trainable%: 0.5273

	[INFO\|2025-03-26 10:34:15] trainer.py:698 >> Using auto half precision backend

	[INFO\|2025-03-26 10:34:15] trainer.py:2313 >> *** Running training ***

	[INFO\|2025-03-26 10:34:15] trainer.py:2314 >> Num examples = 38,495

	[INFO\|2025-03-26 10:34:15] trainer.py:2315 >> Num Epochs = 1

	[INFO\|2025-03-26 10:34:15] trainer.py:2316 >> Instantaneous batch size per device = 8

	[INFO\|2025-03-26 10:34:15] trainer.py:2319 >> Total train batch size (w. parallel, distributed & accumulation) = 192

	[INFO\|2025-03-26 10:34:15] trainer.py:2320 >> Gradient Accumulation steps = 12

	[INFO\|2025-03-26 10:34:15] trainer.py:2321 >> Total optimization steps = 200

	[INFO\|2025-03-26 10:34:15] trainer.py:2322 >> Number of trainable parameters = 40,370,176

	[INFO\|2025-03-26 10:37:04] logging.py:157 >> {'loss': 0.4044, 'learning_rate': 4.9923e-05, 'epoch': 0.02}

	[INFO\|2025-03-26 10:39:53] logging.py:157 >> {'loss': 0.3910, 'learning_rate': 4.9692e-05, 'epoch': 0.05}

	[INFO\|2025-03-26 10:42:42] logging.py:157 >> {'loss': 0.3583, 'learning_rate': 4.9309e-05, 'epoch': 0.07}

	[INFO\|2025-03-26 10:45:30] logging.py:157 >> {'loss': 0.3771, 'learning_rate': 4.8776e-05, 'epoch': 0.10}

	[INFO\|2025-03-26 10:48:19] logging.py:157 >> {'loss': 0.3408, 'learning_rate': 4.8097e-05, 'epoch': 0.12}

	[INFO\|2025-03-26 10:51:08] logging.py:157 >> {'loss': 0.3665, 'learning_rate': 4.7275e-05, 'epoch': 0.15}

	[INFO\|2025-03-26 10:53:56] logging.py:157 >> {'loss': 0.3508, 'learning_rate': 4.6316e-05, 'epoch': 0.17}

	[INFO\|2025-03-26 10:56:44] logging.py:157 >> {'loss': 0.3558, 'learning_rate': 4.5225e-05, 'epoch': 0.20}

	[INFO\|2025-03-26 10:59:33] logging.py:157 >> {'loss': 0.3579, 'learning_rate': 4.4010e-05, 'epoch': 0.22}

	[INFO\|2025-03-26 11:02:21] logging.py:157 >> {'loss': 0.3333, 'learning_rate': 4.2678e-05, 'epoch': 0.25}

	[INFO\|2025-03-26 11:05:10] logging.py:157 >> {'loss': 0.3263, 'learning_rate': 4.1236e-05, 'epoch': 0.27}

	[INFO\|2025-03-26 11:07:58] logging.py:157 >> {'loss': 0.3476, 'learning_rate': 3.9695e-05, 'epoch': 0.30}

	[INFO\|2025-03-26 11:10:47] logging.py:157 >> {'loss': 0.3294, 'learning_rate': 3.8062e-05, 'epoch': 0.32}

	[INFO\|2025-03-26 11:13:35] logging.py:157 >> {'loss': 0.3560, 'learning_rate': 3.6350e-05, 'epoch': 0.35}

	[INFO\|2025-03-26 11:16:24] logging.py:157 >> {'loss': 0.3304, 'learning_rate': 3.4567e-05, 'epoch': 0.37}

	[INFO\|2025-03-26 11:19:13] logging.py:157 >> {'loss': 0.3436, 'learning_rate': 3.2725e-05, 'epoch': 0.40}

	[INFO\|2025-03-26 11:22:01] logging.py:157 >> {'loss': 0.3388, 'learning_rate': 3.0836e-05, 'epoch': 0.42}

	[INFO\|2025-03-26 11:24:49] logging.py:157 >> {'loss': 0.3367, 'learning_rate': 2.8911e-05, 'epoch': 0.45}

	[INFO\|2025-03-26 11:27:38] logging.py:157 >> {'loss': 0.3561, 'learning_rate': 2.6961e-05, 'epoch': 0.47}

	[INFO\|2025-03-26 11:30:26] logging.py:157 >> {'loss': 0.3182, 'learning_rate': 2.5000e-05, 'epoch': 0.50}

	[INFO\|2025-03-26 11:30:26] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2.5-Coder-7B/lora/solidity_qwen_model/checkpoint-100

	[INFO\|2025-03-26 11:30:27] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/config.json

	[INFO\|2025-03-26 11:30:27] configuration_utils.py:746 >> Model config Qwen2Config {
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151643,
	"hidden_act": "silu",
	"hidden_size": 3584,
	"initializer_range": 0.02,
	"intermediate_size": 18944,
	"max_position_embeddings": 32768,
	"max_window_layers": 28,
	"model_type": "qwen2",
	"num_attention_heads": 28,
	"num_hidden_layers": 28,
	"num_key_value_heads": 4,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": false,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.46.1",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 152064
	}


	[INFO\|2025-03-26 11:30:27] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2.5-Coder-7B/lora/solidity_qwen_model/checkpoint-100/tokenizer_config.json

	[INFO\|2025-03-26 11:30:27] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2.5-Coder-7B/lora/solidity_qwen_model/checkpoint-100/special_tokens_map.json

	[INFO\|2025-03-26 11:33:17] logging.py:157 >> {'loss': 0.3331, 'learning_rate': 2.3039e-05, 'epoch': 0.52}

	[INFO\|2025-03-26 11:36:05] logging.py:157 >> {'loss': 0.3383, 'learning_rate': 2.1089e-05, 'epoch': 0.55}

	[INFO\|2025-03-26 11:38:54] logging.py:157 >> {'loss': 0.3366, 'learning_rate': 1.9164e-05, 'epoch': 0.57}

	[INFO\|2025-03-26 11:41:43] logging.py:157 >> {'loss': 0.3324, 'learning_rate': 1.7275e-05, 'epoch': 0.60}

	[INFO\|2025-03-26 11:44:31] logging.py:157 >> {'loss': 0.3030, 'learning_rate': 1.5433e-05, 'epoch': 0.62}

	[INFO\|2025-03-26 11:47:20] logging.py:157 >> {'loss': 0.3500, 'learning_rate': 1.3650e-05, 'epoch': 0.65}

	[INFO\|2025-03-26 11:50:08] logging.py:157 >> {'loss': 0.3310, 'learning_rate': 1.1938e-05, 'epoch': 0.67}

	[INFO\|2025-03-26 11:52:57] logging.py:157 >> {'loss': 0.3440, 'learning_rate': 1.0305e-05, 'epoch': 0.70}

	[INFO\|2025-03-26 11:55:45] logging.py:157 >> {'loss': 0.3567, 'learning_rate': 8.7638e-06, 'epoch': 0.72}

	[INFO\|2025-03-26 11:58:34] logging.py:157 >> {'loss': 0.3184, 'learning_rate': 7.3223e-06, 'epoch': 0.75}

	[INFO\|2025-03-26 12:01:21] logging.py:157 >> {'loss': 0.3279, 'learning_rate': 5.9899e-06, 'epoch': 0.77}

	[INFO\|2025-03-26 12:04:10] logging.py:157 >> {'loss': 0.3372, 'learning_rate': 4.7746e-06, 'epoch': 0.80}

	[INFO\|2025-03-26 12:06:58] logging.py:157 >> {'loss': 0.3553, 'learning_rate': 3.6840e-06, 'epoch': 0.82}

	[INFO\|2025-03-26 12:09:47] logging.py:157 >> {'loss': 0.3361, 'learning_rate': 2.7248e-06, 'epoch': 0.85}

	[INFO\|2025-03-26 12:12:36] logging.py:157 >> {'loss': 0.3335, 'learning_rate': 1.9030e-06, 'epoch': 0.87}

	[INFO\|2025-03-26 12:15:25] logging.py:157 >> {'loss': 0.3425, 'learning_rate': 1.2236e-06, 'epoch': 0.90}

	[INFO\|2025-03-26 12:18:12] logging.py:157 >> {'loss': 0.3247, 'learning_rate': 6.9075e-07, 'epoch': 0.92}

	[INFO\|2025-03-26 12:21:01] logging.py:157 >> {'loss': 0.3314, 'learning_rate': 3.0779e-07, 'epoch': 0.95}

	[INFO\|2025-03-26 12:23:49] logging.py:157 >> {'loss': 0.3354, 'learning_rate': 7.7067e-08, 'epoch': 0.97}

	[INFO\|2025-03-26 12:26:38] logging.py:157 >> {'loss': 0.3413, 'learning_rate': 0.0000e+00, 'epoch': 1.00}

	[INFO\|2025-03-26 12:26:38] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2.5-Coder-7B/lora/solidity_qwen_model/checkpoint-200

	[INFO\|2025-03-26 12:26:38] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/config.json

	[INFO\|2025-03-26 12:26:38] configuration_utils.py:746 >> Model config Qwen2Config {
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151643,
	"hidden_act": "silu",
	"hidden_size": 3584,
	"initializer_range": 0.02,
	"intermediate_size": 18944,
	"max_position_embeddings": 32768,
	"max_window_layers": 28,
	"model_type": "qwen2",
	"num_attention_heads": 28,
	"num_hidden_layers": 28,
	"num_key_value_heads": 4,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": false,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.46.1",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 152064
	}


	[INFO\|2025-03-26 12:26:39] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2.5-Coder-7B/lora/solidity_qwen_model/checkpoint-200/tokenizer_config.json

	[INFO\|2025-03-26 12:26:39] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2.5-Coder-7B/lora/solidity_qwen_model/checkpoint-200/special_tokens_map.json

	[INFO\|2025-03-26 12:26:39] trainer.py:2584 >>

	Training completed. Do not forget to share your model on huggingface.co/models =)



	[INFO\|2025-03-26 12:26:39] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2.5-Coder-7B/lora/solidity_qwen_model

	[INFO\|2025-03-26 12:26:40] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B/snapshots/0396a76181e127dfc13e5c5ec48a8cee09938b02/config.json

	[INFO\|2025-03-26 12:26:40] configuration_utils.py:746 >> Model config Qwen2Config {
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151643,
	"hidden_act": "silu",
	"hidden_size": 3584,
	"initializer_range": 0.02,
	"intermediate_size": 18944,
	"max_position_embeddings": 32768,
	"max_window_layers": 28,
	"model_type": "qwen2",
	"num_attention_heads": 28,
	"num_hidden_layers": 28,
	"num_key_value_heads": 4,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": false,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.46.1",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 152064
	}


	[INFO\|2025-03-26 12:26:40] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2.5-Coder-7B/lora/solidity_qwen_model/tokenizer_config.json

	[INFO\|2025-03-26 12:26:40] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2.5-Coder-7B/lora/solidity_qwen_model/special_tokens_map.json

	[WARNING\|2025-03-26 12:26:40] logging.py:162 >> No metric eval_loss to plot.

	[WARNING\|2025-03-26 12:26:40] logging.py:162 >> No metric eval_accuracy to plot.

	[INFO\|2025-03-26 12:26:40] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields:
	{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}