k1h0's picture
Upload folder using huggingface_hub
8799709 verified
[INFO|2025-07-09 20:08:23] configuration_utils.py:696 >> loading configuration file config.json from cache at /home/kiho/.cache/huggingface/hub/models--deepseek-ai--deepseek-coder-7b-instruct-v1.5/snapshots/2a050a4c59d687a85324d32e147517992117ed30/config.json
[INFO|2025-07-09 20:08:23] configuration_utils.py:768 >> Model config LlamaConfig {
"_name_or_path": "deepseek-ai/deepseek-coder-7b-instruct-v1.5",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 100000,
"eos_token_id": 100015,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 30,
"num_key_value_heads": 32,
"pretraining_tp": 1,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.48.2",
"use_cache": true,
"vocab_size": 102400
}
[INFO|2025-07-09 20:08:23] tokenization_utils_base.py:2034 >> loading file tokenizer.model from cache at None
[INFO|2025-07-09 20:08:23] tokenization_utils_base.py:2034 >> loading file tokenizer.json from cache at /home/kiho/.cache/huggingface/hub/models--deepseek-ai--deepseek-coder-7b-instruct-v1.5/snapshots/2a050a4c59d687a85324d32e147517992117ed30/tokenizer.json
[INFO|2025-07-09 20:08:23] tokenization_utils_base.py:2034 >> loading file added_tokens.json from cache at None
[INFO|2025-07-09 20:08:23] tokenization_utils_base.py:2034 >> loading file special_tokens_map.json from cache at None
[INFO|2025-07-09 20:08:23] tokenization_utils_base.py:2034 >> loading file tokenizer_config.json from cache at /home/kiho/.cache/huggingface/hub/models--deepseek-ai--deepseek-coder-7b-instruct-v1.5/snapshots/2a050a4c59d687a85324d32e147517992117ed30/tokenizer_config.json
[INFO|2025-07-09 20:08:23] tokenization_utils_base.py:2034 >> loading file chat_template.jinja from cache at None
[INFO|2025-07-09 20:08:24] tokenization_utils_base.py:2304 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2025-07-09 20:08:25] configuration_utils.py:696 >> loading configuration file config.json from cache at /home/kiho/.cache/huggingface/hub/models--deepseek-ai--deepseek-coder-7b-instruct-v1.5/snapshots/2a050a4c59d687a85324d32e147517992117ed30/config.json
[INFO|2025-07-09 20:08:25] configuration_utils.py:768 >> Model config LlamaConfig {
"_name_or_path": "deepseek-ai/deepseek-coder-7b-instruct-v1.5",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 100000,
"eos_token_id": 100015,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 30,
"num_key_value_heads": 32,
"pretraining_tp": 1,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.48.2",
"use_cache": true,
"vocab_size": 102400
}
[INFO|2025-07-09 20:08:26] tokenization_utils_base.py:2034 >> loading file tokenizer.model from cache at None
[INFO|2025-07-09 20:08:26] tokenization_utils_base.py:2034 >> loading file tokenizer.json from cache at /home/kiho/.cache/huggingface/hub/models--deepseek-ai--deepseek-coder-7b-instruct-v1.5/snapshots/2a050a4c59d687a85324d32e147517992117ed30/tokenizer.json
[INFO|2025-07-09 20:08:26] tokenization_utils_base.py:2034 >> loading file added_tokens.json from cache at None
[INFO|2025-07-09 20:08:26] tokenization_utils_base.py:2034 >> loading file special_tokens_map.json from cache at None
[INFO|2025-07-09 20:08:26] tokenization_utils_base.py:2034 >> loading file tokenizer_config.json from cache at /home/kiho/.cache/huggingface/hub/models--deepseek-ai--deepseek-coder-7b-instruct-v1.5/snapshots/2a050a4c59d687a85324d32e147517992117ed30/tokenizer_config.json
[INFO|2025-07-09 20:08:26] tokenization_utils_base.py:2034 >> loading file chat_template.jinja from cache at None
[INFO|2025-07-09 20:08:26] tokenization_utils_base.py:2304 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2025-07-09 20:08:26] logging.py:157 >> Loading dataset Codes3_query_filtered_553474_mark_less_than_8.0.json...
[INFO|2025-07-09 20:09:10] configuration_utils.py:696 >> loading configuration file config.json from cache at /home/kiho/.cache/huggingface/hub/models--deepseek-ai--deepseek-coder-7b-instruct-v1.5/snapshots/2a050a4c59d687a85324d32e147517992117ed30/config.json
[INFO|2025-07-09 20:09:10] configuration_utils.py:768 >> Model config LlamaConfig {
"_name_or_path": "deepseek-ai/deepseek-coder-7b-instruct-v1.5",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 100000,
"eos_token_id": 100015,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 30,
"num_key_value_heads": 32,
"pretraining_tp": 1,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.48.2",
"use_cache": true,
"vocab_size": 102400
}
[WARNING|2025-07-09 20:09:10] logging.py:162 >> Input length is smaller than max length. Consider increase input length.
[INFO|2025-07-09 20:09:10] logging.py:157 >> Using llama3 scaling strategy and setting scaling factor to 1.0.
[INFO|2025-07-09 20:09:10] logging.py:157 >> Using block diagonal attention for sequence packing without cross-attention.
[INFO|2025-07-09 20:09:11] logging.py:157 >> Liger kernel has been applied to the model.
[INFO|2025-07-09 20:09:11] modeling_utils.py:3904 >> loading weights file model.safetensors from cache at /home/kiho/.cache/huggingface/hub/models--deepseek-ai--deepseek-coder-7b-instruct-v1.5/snapshots/2a050a4c59d687a85324d32e147517992117ed30/model.safetensors.index.json
[INFO|2025-07-09 20:09:11] modeling_utils.py:1582 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|2025-07-09 20:09:11] configuration_utils.py:1140 >> Generate config GenerationConfig {
"bos_token_id": 100000,
"eos_token_id": 100015
}
[INFO|2025-07-09 20:09:14] modeling_utils.py:4888 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|2025-07-09 20:09:14] modeling_utils.py:4896 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at deepseek-ai/deepseek-coder-7b-instruct-v1.5.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|2025-07-09 20:09:14] configuration_utils.py:1095 >> loading configuration file generation_config.json from cache at /home/kiho/.cache/huggingface/hub/models--deepseek-ai--deepseek-coder-7b-instruct-v1.5/snapshots/2a050a4c59d687a85324d32e147517992117ed30/generation_config.json
[INFO|2025-07-09 20:09:14] configuration_utils.py:1140 >> Generate config GenerationConfig {
"bos_token_id": 100000,
"eos_token_id": 100015
}
[INFO|2025-07-09 20:09:14] logging.py:157 >> Gradient checkpointing enabled.
[INFO|2025-07-09 20:09:14] logging.py:157 >> Using torch SDPA for faster training and inference.
[INFO|2025-07-09 20:09:14] logging.py:157 >> Upcasting trainable params to float32.
[INFO|2025-07-09 20:09:14] logging.py:157 >> Fine-tuning method: Freeze
[INFO|2025-07-09 20:09:14] logging.py:157 >> Set trainable layers: .14.,.29.
[INFO|2025-07-09 20:09:14] logging.py:157 >> trainable params: 404,766,720 || all params: 6,910,365,696 || trainable%: 5.8574
[INFO|2025-07-09 20:09:14] trainer.py:741 >> Using auto half precision backend
[INFO|2025-07-09 20:09:14] logging.py:157 >> Found linear modules: up_proj,k_proj,gate_proj,down_proj,o_proj,q_proj,v_proj
[INFO|2025-07-09 20:09:14] logging.py:157 >> Using APOLLO optimizer with args: {'rank': 256, 'proj': 'random', 'proj_type': 'std', 'update_proj_gap': 200, 'scale': 1, 'scale_type': 'channel', 'scale_front': False}.
[INFO|2025-07-09 20:09:15] trainer.py:2369 >> ***** Running training *****
[INFO|2025-07-09 20:09:15] trainer.py:2370 >> Num examples = 32,858
[INFO|2025-07-09 20:09:15] trainer.py:2371 >> Num Epochs = 1
[INFO|2025-07-09 20:09:15] trainer.py:2372 >> Instantaneous batch size per device = 16
[INFO|2025-07-09 20:09:15] trainer.py:2375 >> Total train batch size (w. parallel, distributed & accumulation) = 384
[INFO|2025-07-09 20:09:15] trainer.py:2376 >> Gradient Accumulation steps = 8
[INFO|2025-07-09 20:09:15] trainer.py:2377 >> Total optimization steps = 85
[INFO|2025-07-09 20:09:15] trainer.py:2378 >> Number of trainable parameters = 404,766,720
[INFO|2025-07-09 20:11:56] logging.py:157 >> {'loss': 1.2528, 'learning_rate': 4.9983e-05, 'epoch': 0.01, 'throughput': 9835.11}
[INFO|2025-07-09 20:14:29] logging.py:157 >> {'loss': 1.1456, 'learning_rate': 4.9932e-05, 'epoch': 0.02, 'throughput': 10039.91}
[INFO|2025-07-09 20:17:02] logging.py:157 >> {'loss': 1.0528, 'learning_rate': 4.9846e-05, 'epoch': 0.04, 'throughput': 10116.78}
[INFO|2025-07-09 20:19:35] logging.py:157 >> {'loss': 0.9542, 'learning_rate': 4.9727e-05, 'epoch': 0.05, 'throughput': 10154.68}
[INFO|2025-07-09 20:22:08] logging.py:157 >> {'loss': 0.8888, 'learning_rate': 4.9574e-05, 'epoch': 0.06, 'throughput': 10175.92}
[INFO|2025-07-09 20:24:43] logging.py:157 >> {'loss': 0.8125, 'learning_rate': 4.9388e-05, 'epoch': 0.07, 'throughput': 10177.83}
[INFO|2025-07-09 20:27:16] logging.py:157 >> {'loss': 0.7639, 'learning_rate': 4.9168e-05, 'epoch': 0.08, 'throughput': 10187.10}
[INFO|2025-07-09 20:29:49] logging.py:157 >> {'loss': 0.7068, 'learning_rate': 4.8915e-05, 'epoch': 0.09, 'throughput': 10197.99}
[INFO|2025-07-09 20:32:23] logging.py:157 >> {'loss': 0.6852, 'learning_rate': 4.8630e-05, 'epoch': 0.11, 'throughput': 10205.83}
[INFO|2025-07-09 20:34:56] logging.py:157 >> {'loss': 0.6571, 'learning_rate': 4.8312e-05, 'epoch': 0.12, 'throughput': 10211.26}
[INFO|2025-07-09 20:37:29] logging.py:157 >> {'loss': 0.6347, 'learning_rate': 4.7962e-05, 'epoch': 0.13, 'throughput': 10216.60}
[INFO|2025-07-09 20:40:02] logging.py:157 >> {'loss': 0.6191, 'learning_rate': 4.7581e-05, 'epoch': 0.14, 'throughput': 10220.74}
[INFO|2025-07-09 20:42:36] logging.py:157 >> {'loss': 0.5761, 'learning_rate': 4.7169e-05, 'epoch': 0.15, 'throughput': 10224.20}
[INFO|2025-07-09 20:45:09] logging.py:157 >> {'loss': 0.5772, 'learning_rate': 4.6727e-05, 'epoch': 0.16, 'throughput': 10227.13}
[INFO|2025-07-09 20:47:42] logging.py:157 >> {'loss': 0.5579, 'learning_rate': 4.6255e-05, 'epoch': 0.18, 'throughput': 10229.70}
[INFO|2025-07-09 20:50:15] logging.py:157 >> {'loss': 0.5674, 'learning_rate': 4.5755e-05, 'epoch': 0.19, 'throughput': 10233.46}
[INFO|2025-07-09 20:52:48] logging.py:157 >> {'loss': 0.5767, 'learning_rate': 4.5225e-05, 'epoch': 0.20, 'throughput': 10235.03}
[INFO|2025-07-09 20:55:21] logging.py:157 >> {'loss': 0.5559, 'learning_rate': 4.4669e-05, 'epoch': 0.21, 'throughput': 10236.83}
[INFO|2025-07-09 20:57:55] logging.py:157 >> {'loss': 0.5603, 'learning_rate': 4.4085e-05, 'epoch': 0.22, 'throughput': 10238.16}
[INFO|2025-07-09 21:00:28] logging.py:157 >> {'loss': 0.5541, 'learning_rate': 4.3475e-05, 'epoch': 0.23, 'throughput': 10239.41}
[INFO|2025-07-09 21:03:01] logging.py:157 >> {'loss': 0.5358, 'learning_rate': 4.2840e-05, 'epoch': 0.25, 'throughput': 10240.84}
[INFO|2025-07-09 21:05:34] logging.py:157 >> {'loss': 0.5290, 'learning_rate': 4.2181e-05, 'epoch': 0.26, 'throughput': 10242.63}
[INFO|2025-07-09 21:08:07] logging.py:157 >> {'loss': 0.5391, 'learning_rate': 4.1498e-05, 'epoch': 0.27, 'throughput': 10243.79}
[INFO|2025-07-09 21:10:40] logging.py:157 >> {'loss': 0.5368, 'learning_rate': 4.0793e-05, 'epoch': 0.28, 'throughput': 10244.74}
[INFO|2025-07-09 21:13:14] logging.py:157 >> {'loss': 0.5226, 'learning_rate': 4.0066e-05, 'epoch': 0.29, 'throughput': 10245.36}
[INFO|2025-07-09 21:15:47] logging.py:157 >> {'loss': 0.5181, 'learning_rate': 3.9318e-05, 'epoch': 0.30, 'throughput': 10245.84}
[INFO|2025-07-09 21:18:20] logging.py:157 >> {'loss': 0.5368, 'learning_rate': 3.8551e-05, 'epoch': 0.32, 'throughput': 10246.61}
[INFO|2025-07-09 21:20:53] logging.py:157 >> {'loss': 0.5143, 'learning_rate': 3.7766e-05, 'epoch': 0.33, 'throughput': 10247.05}
[INFO|2025-07-09 21:23:27] logging.py:157 >> {'loss': 0.5084, 'learning_rate': 3.6963e-05, 'epoch': 0.34, 'throughput': 10247.38}
[INFO|2025-07-09 21:26:00] logging.py:157 >> {'loss': 0.5026, 'learning_rate': 3.6143e-05, 'epoch': 0.35, 'throughput': 10247.73}
[INFO|2025-07-09 21:28:33] logging.py:157 >> {'loss': 0.5300, 'learning_rate': 3.5309e-05, 'epoch': 0.36, 'throughput': 10248.10}
[INFO|2025-07-09 21:31:07] logging.py:157 >> {'loss': 0.5320, 'learning_rate': 3.4460e-05, 'epoch': 0.37, 'throughput': 10248.71}
[INFO|2025-07-09 21:33:40] logging.py:157 >> {'loss': 0.5257, 'learning_rate': 3.3599e-05, 'epoch': 0.39, 'throughput': 10248.92}
[INFO|2025-07-09 21:36:13] logging.py:157 >> {'loss': 0.5078, 'learning_rate': 3.2725e-05, 'epoch': 0.40, 'throughput': 10249.40}
[INFO|2025-07-09 21:38:46] logging.py:157 >> {'loss': 0.5012, 'learning_rate': 3.1842e-05, 'epoch': 0.41, 'throughput': 10249.82}
[INFO|2025-07-09 21:41:20] logging.py:157 >> {'loss': 0.5117, 'learning_rate': 3.0948e-05, 'epoch': 0.42, 'throughput': 10250.03}
[INFO|2025-07-09 21:43:53] logging.py:157 >> {'loss': 0.5160, 'learning_rate': 3.0047e-05, 'epoch': 0.43, 'throughput': 10250.39}
[INFO|2025-07-09 21:46:26] logging.py:157 >> {'loss': 0.5189, 'learning_rate': 2.9139e-05, 'epoch': 0.44, 'throughput': 10250.64}
[INFO|2025-07-09 21:49:00] logging.py:157 >> {'loss': 0.4973, 'learning_rate': 2.8225e-05, 'epoch': 0.46, 'throughput': 10251.16}
[INFO|2025-07-09 21:51:33] logging.py:157 >> {'loss': 0.4942, 'learning_rate': 2.7307e-05, 'epoch': 0.47, 'throughput': 10251.77}
[INFO|2025-07-09 21:54:06] logging.py:157 >> {'loss': 0.5069, 'learning_rate': 2.6385e-05, 'epoch': 0.48, 'throughput': 10252.24}
[INFO|2025-07-09 21:56:39] logging.py:157 >> {'loss': 0.5218, 'learning_rate': 2.5462e-05, 'epoch': 0.49, 'throughput': 10252.43}
[INFO|2025-07-09 21:59:12] logging.py:157 >> {'loss': 0.5003, 'learning_rate': 2.4538e-05, 'epoch': 0.50, 'throughput': 10252.63}
[INFO|2025-07-09 22:01:45] logging.py:157 >> {'loss': 0.4997, 'learning_rate': 2.3615e-05, 'epoch': 0.51, 'throughput': 10253.02}
[INFO|2025-07-09 22:04:19] logging.py:157 >> {'loss': 0.5045, 'learning_rate': 2.2693e-05, 'epoch': 0.53, 'throughput': 10253.25}
[INFO|2025-07-09 22:06:52] logging.py:157 >> {'loss': 0.4972, 'learning_rate': 2.1775e-05, 'epoch': 0.54, 'throughput': 10253.46}
[INFO|2025-07-09 22:09:25] logging.py:157 >> {'loss': 0.5174, 'learning_rate': 2.0861e-05, 'epoch': 0.55, 'throughput': 10253.76}
[INFO|2025-07-09 22:11:58] logging.py:157 >> {'loss': 0.5086, 'learning_rate': 1.9953e-05, 'epoch': 0.56, 'throughput': 10254.17}
[INFO|2025-07-09 22:14:32] logging.py:157 >> {'loss': 0.4988, 'learning_rate': 1.9052e-05, 'epoch': 0.57, 'throughput': 10254.33}
[INFO|2025-07-09 22:17:05] logging.py:157 >> {'loss': 0.5047, 'learning_rate': 1.8158e-05, 'epoch': 0.58, 'throughput': 10254.36}
[INFO|2025-07-09 22:19:38] logging.py:157 >> {'loss': 0.4977, 'learning_rate': 1.7275e-05, 'epoch': 0.60, 'throughput': 10254.39}
[INFO|2025-07-09 22:22:11] logging.py:157 >> {'loss': 0.4775, 'learning_rate': 1.6401e-05, 'epoch': 0.61, 'throughput': 10254.65}
[INFO|2025-07-09 22:24:45] logging.py:157 >> {'loss': 0.5190, 'learning_rate': 1.5540e-05, 'epoch': 0.62, 'throughput': 10254.75}
[INFO|2025-07-09 22:27:18] logging.py:157 >> {'loss': 0.5102, 'learning_rate': 1.4691e-05, 'epoch': 0.63, 'throughput': 10254.76}
[INFO|2025-07-09 22:29:51] logging.py:157 >> {'loss': 0.4705, 'learning_rate': 1.3857e-05, 'epoch': 0.64, 'throughput': 10254.79}
[INFO|2025-07-09 22:32:25] logging.py:157 >> {'loss': 0.4965, 'learning_rate': 1.3037e-05, 'epoch': 0.65, 'throughput': 10254.68}
[INFO|2025-07-09 22:34:58] logging.py:157 >> {'loss': 0.5030, 'learning_rate': 1.2234e-05, 'epoch': 0.67, 'throughput': 10254.63}
[INFO|2025-07-09 22:37:32] logging.py:157 >> {'loss': 0.4921, 'learning_rate': 1.1449e-05, 'epoch': 0.68, 'throughput': 10254.63}
[INFO|2025-07-09 22:40:05] logging.py:157 >> {'loss': 0.5042, 'learning_rate': 1.0682e-05, 'epoch': 0.69, 'throughput': 10254.57}
[INFO|2025-07-09 22:42:38] logging.py:157 >> {'loss': 0.5145, 'learning_rate': 9.9341e-06, 'epoch': 0.70, 'throughput': 10254.68}
[INFO|2025-07-09 22:45:12] logging.py:157 >> {'loss': 0.4897, 'learning_rate': 9.2072e-06, 'epoch': 0.71, 'throughput': 10254.77}
[INFO|2025-07-09 22:47:45] logging.py:157 >> {'loss': 0.5033, 'learning_rate': 8.5019e-06, 'epoch': 0.72, 'throughput': 10254.85}
[INFO|2025-07-09 22:50:18] logging.py:157 >> {'loss': 0.4730, 'learning_rate': 7.8191e-06, 'epoch': 0.74, 'throughput': 10254.95}
[INFO|2025-07-09 22:52:52] logging.py:157 >> {'loss': 0.5298, 'learning_rate': 7.1597e-06, 'epoch': 0.75, 'throughput': 10255.10}
[INFO|2025-07-09 22:55:25] logging.py:157 >> {'loss': 0.4685, 'learning_rate': 6.5248e-06, 'epoch': 0.76, 'throughput': 10255.23}
[INFO|2025-07-09 22:57:58] logging.py:157 >> {'loss': 0.4931, 'learning_rate': 5.9150e-06, 'epoch': 0.77, 'throughput': 10255.33}
[INFO|2025-07-09 23:00:31] logging.py:157 >> {'loss': 0.4957, 'learning_rate': 5.3314e-06, 'epoch': 0.78, 'throughput': 10255.52}
[INFO|2025-07-09 23:03:04] logging.py:157 >> {'loss': 0.4924, 'learning_rate': 4.7746e-06, 'epoch': 0.79, 'throughput': 10255.76}
[INFO|2025-07-09 23:05:38] logging.py:157 >> {'loss': 0.5041, 'learning_rate': 4.2454e-06, 'epoch': 0.81, 'throughput': 10255.93}
[INFO|2025-07-09 23:08:11] logging.py:157 >> {'loss': 0.4948, 'learning_rate': 3.7446e-06, 'epoch': 0.82, 'throughput': 10256.07}
[INFO|2025-07-09 23:10:44] logging.py:157 >> {'loss': 0.5085, 'learning_rate': 3.2728e-06, 'epoch': 0.83, 'throughput': 10256.08}
[INFO|2025-07-09 23:13:17] logging.py:157 >> {'loss': 0.5152, 'learning_rate': 2.8307e-06, 'epoch': 0.84, 'throughput': 10256.66}
[INFO|2025-07-09 23:15:50] logging.py:157 >> {'loss': 0.4905, 'learning_rate': 2.4188e-06, 'epoch': 0.85, 'throughput': 10256.64}
[INFO|2025-07-09 23:18:24] logging.py:157 >> {'loss': 0.5023, 'learning_rate': 2.0378e-06, 'epoch': 0.86, 'throughput': 10256.67}
[INFO|2025-07-09 23:20:57] logging.py:157 >> {'loss': 0.4993, 'learning_rate': 1.6882e-06, 'epoch': 0.88, 'throughput': 10256.80}
[INFO|2025-07-09 23:23:30] logging.py:157 >> {'loss': 0.4810, 'learning_rate': 1.3704e-06, 'epoch': 0.89, 'throughput': 10256.78}
[INFO|2025-07-09 23:26:04] logging.py:157 >> {'loss': 0.4806, 'learning_rate': 1.0849e-06, 'epoch': 0.90, 'throughput': 10256.76}
[INFO|2025-07-09 23:28:37] logging.py:157 >> {'loss': 0.4915, 'learning_rate': 8.3204e-07, 'epoch': 0.91, 'throughput': 10256.95}
[INFO|2025-07-09 23:31:10] logging.py:157 >> {'loss': 0.4937, 'learning_rate': 6.1220e-07, 'epoch': 0.92, 'throughput': 10256.91}
[INFO|2025-07-09 23:33:43] logging.py:157 >> {'loss': 0.5152, 'learning_rate': 4.2567e-07, 'epoch': 0.93, 'throughput': 10256.86}
[INFO|2025-07-09 23:36:17] logging.py:157 >> {'loss': 0.4832, 'learning_rate': 2.7271e-07, 'epoch': 0.95, 'throughput': 10257.04}
[INFO|2025-07-09 23:38:50] logging.py:157 >> {'loss': 0.4975, 'learning_rate': 1.5352e-07, 'epoch': 0.96, 'throughput': 10257.12}
[INFO|2025-07-09 23:41:23] logging.py:157 >> {'loss': 0.4887, 'learning_rate': 6.8271e-08, 'epoch': 0.97, 'throughput': 10257.16}
[INFO|2025-07-09 23:43:56] logging.py:157 >> {'loss': 0.4905, 'learning_rate': 1.7073e-08, 'epoch': 0.98, 'throughput': 10257.17}
[INFO|2025-07-09 23:46:30] logging.py:157 >> {'loss': 0.5088, 'learning_rate': 0.0000e+00, 'epoch': 0.99, 'throughput': 10257.11}
[INFO|2025-07-09 23:46:30] trainer.py:3910 >> Saving model checkpoint to saves/DeepSeek-Coder-7B-Instruct/freeze/deepseek_under8_nlx/checkpoint-85
[INFO|2025-07-09 23:46:30] configuration_utils.py:420 >> Configuration saved in saves/DeepSeek-Coder-7B-Instruct/freeze/deepseek_under8_nlx/checkpoint-85/config.json
[INFO|2025-07-09 23:46:30] configuration_utils.py:909 >> Configuration saved in saves/DeepSeek-Coder-7B-Instruct/freeze/deepseek_under8_nlx/checkpoint-85/generation_config.json
[INFO|2025-07-09 23:46:51] modeling_utils.py:2996 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at saves/DeepSeek-Coder-7B-Instruct/freeze/deepseek_under8_nlx/checkpoint-85/model.safetensors.index.json.
[INFO|2025-07-09 23:46:51] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/DeepSeek-Coder-7B-Instruct/freeze/deepseek_under8_nlx/checkpoint-85/tokenizer_config.json
[INFO|2025-07-09 23:46:51] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/DeepSeek-Coder-7B-Instruct/freeze/deepseek_under8_nlx/checkpoint-85/special_tokens_map.json
[INFO|2025-07-09 23:46:52] trainer.py:2643 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
[INFO|2025-07-09 23:46:52] trainer.py:3910 >> Saving model checkpoint to saves/DeepSeek-Coder-7B-Instruct/freeze/deepseek_under8_nlx
[INFO|2025-07-09 23:46:52] configuration_utils.py:420 >> Configuration saved in saves/DeepSeek-Coder-7B-Instruct/freeze/deepseek_under8_nlx/config.json
[INFO|2025-07-09 23:46:52] configuration_utils.py:909 >> Configuration saved in saves/DeepSeek-Coder-7B-Instruct/freeze/deepseek_under8_nlx/generation_config.json
[INFO|2025-07-09 23:47:16] modeling_utils.py:2996 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at saves/DeepSeek-Coder-7B-Instruct/freeze/deepseek_under8_nlx/model.safetensors.index.json.
[INFO|2025-07-09 23:47:16] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/DeepSeek-Coder-7B-Instruct/freeze/deepseek_under8_nlx/tokenizer_config.json
[INFO|2025-07-09 23:47:16] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/DeepSeek-Coder-7B-Instruct/freeze/deepseek_under8_nlx/special_tokens_map.json
[WARNING|2025-07-09 23:47:16] logging.py:162 >> No metric eval_loss to plot.
[WARNING|2025-07-09 23:47:16] logging.py:162 >> No metric eval_accuracy to plot.
[INFO|2025-07-09 23:47:16] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}