yalhessi
/

lemexp-hol-thms-by-file-deepseek-coder-1.3b-base

Generated from Trainer

Model card Files Files and versions Community

lemexp-hol-thms-by-file-deepseek-coder-1.3b-base / README.md

yalhessi's picture

End of training

7224717 verified 2 months ago

|

history blame contribute delete

3.1 kB

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- generated_from_trainer
	model-index:
	- name: lemexp-hol-thms-by-file-deepseek-coder-1.3b-base
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-hol-thms-by-file-deepseek-coder-1.3b-base

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2839

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 6
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:------:\|:---------------:\|
	\| 0.4174 \| 0.2000 \| 3777 \| 0.4078 \|
	\| 0.34 \| 0.4000 \| 7554 \| 0.3589 \|
	\| 0.3104 \| 0.6001 \| 11331 \| 0.3305 \|
	\| 0.2901 \| 0.8001 \| 15108 \| 0.3113 \|
	\| 0.2823 \| 1.0001 \| 18885 \| 0.2998 \|
	\| 0.2591 \| 1.2001 \| 22662 \| 0.2958 \|
	\| 0.2597 \| 1.4001 \| 26439 \| 0.2871 \|
	\| 0.2536 \| 1.6002 \| 30216 \| 0.2895 \|
	\| 0.2353 \| 1.8002 \| 33993 \| 0.2819 \|
	\| 0.2236 \| 2.0002 \| 37770 \| 0.2784 \|
	\| 0.2055 \| 2.2002 \| 41547 \| 0.2898 \|
	\| 0.2129 \| 2.4003 \| 45324 \| 0.2687 \|
	\| 0.2001 \| 2.6003 \| 49101 \| 0.2719 \|
	\| 0.2108 \| 2.8003 \| 52878 \| 0.2714 \|
	\| 0.2023 \| 3.0003 \| 56655 \| 0.2650 \|
	\| 0.1811 \| 3.2003 \| 60432 \| 0.2709 \|
	\| 0.1702 \| 3.4004 \| 64209 \| 0.2655 \|
	\| 0.176 \| 3.6004 \| 67986 \| 0.2665 \|
	\| 0.1702 \| 3.8004 \| 71763 \| 0.2666 \|
	\| 0.1597 \| 4.0004 \| 75540 \| 0.2620 \|
	\| 0.1388 \| 4.2004 \| 79317 \| 0.2704 \|
	\| 0.1452 \| 4.4005 \| 83094 \| 0.2707 \|
	\| 0.146 \| 4.6005 \| 86871 \| 0.2705 \|
	\| 0.1388 \| 4.8005 \| 90648 \| 0.2622 \|
	\| 0.1435 \| 5.0005 \| 94425 \| 0.2649 \|
	\| 0.1295 \| 5.2006 \| 98202 \| 0.2794 \|
	\| 0.1195 \| 5.4006 \| 101979 \| 0.2780 \|
	\| 0.124 \| 5.6006 \| 105756 \| 0.2796 \|
	\| 0.1183 \| 5.8006 \| 109533 \| 0.2839 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0