xuanye
/

llama3_question

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

llama3_question / README.md

xuanye's picture

end of training

0827a4f verified 11 months ago

|

history blame contribute delete

3.69 kB

	---
	tags:
	- generated_from_trainer
	model-index:
	- name: llama3_question
	results: []
	library_name: peft
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama3_question

	This model was trained from scratch on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.8999

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure


	The following `bitsandbytes` quantization config was used during training:
	- quant_method: bitsandbytes
	- load_in_8bit: False
	- load_in_4bit: True
	- llm_int8_threshold: 6.0
	- llm_int8_skip_modules: None
	- llm_int8_enable_fp32_cpu_offload: False
	- llm_int8_has_fp16_weight: False
	- bnb_4bit_quant_type: nf4
	- bnb_4bit_use_double_quant: False
	- bnb_4bit_compute_dtype: float16
	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant
	- lr_scheduler_warmup_ratio: 0.03
	- num_epochs: 6

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 2.9948 \| 0.14 \| 1 \| 2.8184 \|
	\| 2.8697 \| 0.29 \| 2 \| 2.6592 \|
	\| 2.6264 \| 0.43 \| 3 \| 2.4946 \|
	\| 2.625 \| 0.57 \| 4 \| 2.3588 \|
	\| 2.3888 \| 0.71 \| 5 \| 2.2385 \|
	\| 2.2949 \| 0.86 \| 6 \| 2.1219 \|
	\| 2.5261 \| 1.0 \| 7 \| 2.0221 \|
	\| 2.0264 \| 1.14 \| 8 \| 1.9246 \|
	\| 1.9661 \| 1.29 \| 9 \| 1.8298 \|
	\| 1.9106 \| 1.43 \| 10 \| 1.7456 \|
	\| 1.8448 \| 1.57 \| 11 \| 1.6686 \|
	\| 1.619 \| 1.71 \| 12 \| 1.6050 \|
	\| 1.5881 \| 1.86 \| 13 \| 1.5468 \|
	\| 1.6859 \| 2.0 \| 14 \| 1.4939 \|
	\| 1.4643 \| 2.14 \| 15 \| 1.4453 \|
	\| 1.4583 \| 2.29 \| 16 \| 1.3949 \|
	\| 1.4086 \| 2.43 \| 17 \| 1.3441 \|
	\| 1.3314 \| 2.57 \| 18 \| 1.2914 \|
	\| 1.3502 \| 2.71 \| 19 \| 1.2400 \|
	\| 1.226 \| 2.86 \| 20 \| 1.1892 \|
	\| 1.073 \| 3.0 \| 21 \| 1.1445 \|
	\| 1.1113 \| 3.14 \| 22 \| 1.0995 \|
	\| 1.1292 \| 3.29 \| 23 \| 1.0570 \|
	\| 1.0242 \| 3.43 \| 24 \| 1.0164 \|
	\| 0.9279 \| 3.57 \| 25 \| 0.9826 \|
	\| 0.8518 \| 3.71 \| 26 \| 0.9617 \|
	\| 1.0302 \| 3.86 \| 27 \| 0.9491 \|
	\| 1.1736 \| 4.0 \| 28 \| 0.9418 \|
	\| 0.8832 \| 4.14 \| 29 \| 0.9352 \|
	\| 0.9151 \| 4.29 \| 30 \| 0.9301 \|
	\| 0.7495 \| 4.43 \| 31 \| 0.9256 \|
	\| 0.8785 \| 4.57 \| 32 \| 0.9220 \|
	\| 0.8635 \| 4.71 \| 33 \| 0.9180 \|
	\| 0.9499 \| 4.86 \| 34 \| 0.9150 \|
	\| 0.8744 \| 5.0 \| 35 \| 0.9125 \|
	\| 0.8221 \| 5.14 \| 36 \| 0.9093 \|
	\| 0.7826 \| 5.29 \| 37 \| 0.9064 \|
	\| 0.8421 \| 5.43 \| 38 \| 0.9047 \|
	\| 0.8155 \| 5.57 \| 39 \| 0.9029 \|
	\| 0.9097 \| 5.71 \| 40 \| 0.9010 \|
	\| 0.7449 \| 5.86 \| 41 \| 0.9003 \|
	\| 0.9502 \| 6.0 \| 42 \| 0.8999 \|


	### Framework versions

	- PEFT 0.5.0
	- Transformers 4.37.2
	- Pytorch 2.1.2
	- Datasets 2.18.0
	- Tokenizers 0.15.1