End of training

9d4b7d1 verified 3 days ago

7.08 kB

	---
	library_name: peft
	license: mit
	base_model: EleutherAI/gpt-neo-125m
	tags:
	- axolotl
	- generated_from_trainer
	model-index:
	- name: 19783dba-2611-430a-89e2-4d277105a2fb
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.1`
	```yaml
	adapter: lora
	base_model: EleutherAI/gpt-neo-125m
	bf16: true
	chat_template: llama3
	dataset_prepared_path: null
	datasets:
	- data_files:
	- 9400c082b072ce22_train_data.json
	ds_type: json
	format: custom
	path: /workspace/input_data/9400c082b072ce22_train_data.json
	type:
	field_instruction: ja
	field_output: en
	format: '{instruction}'
	no_input_format: '{instruction}'
	system_format: '{system}'
	system_prompt: ''
	debug: null
	deepspeed: null
	early_stopping_patience: 4
	eval_max_new_tokens: 128
	eval_steps: 150
	eval_table_size: null
	flash_attention: false
	fp16: null
	fsdp: null
	fsdp_config: null
	gradient_accumulation_steps: 4
	gradient_checkpointing: true
	group_by_length: false
	hub_model_id: Romain-XV/19783dba-2611-430a-89e2-4d277105a2fb
	hub_repo: null
	hub_strategy: checkpoint
	hub_token: null
	learning_rate: 0.0002
	load_best_model_at_end: true
	load_in_4bit: true
	load_in_8bit: false
	local_rank: null
	logging_steps: 1
	lora_alpha: 128
	lora_dropout: 0.3
	lora_fan_in_fan_out: null
	lora_model_dir: null
	lora_r: 64
	lora_target_linear: true
	lora_target_modules:
	- q_proj
	- k_proj
	lr_scheduler: cosine
	max_grad_norm: 1.0
	max_steps: 9660
	micro_batch_size: 2
	mlflow_experiment_name: /tmp/9400c082b072ce22_train_data.json
	model_type: AutoModelForCausalLM
	num_epochs: 3
	optimizer: adamw_bnb_8bit
	output_dir: miner_id_24
	pad_to_sequence_len: true
	resume_from_checkpoint: null
	s2_attention: null
	sample_packing: false
	save_steps: 150
	sequence_len: 1024
	special_tokens:
	pad_token: <\|endoftext\|>
	strict: false
	tf32: true
	tokenizer_type: AutoTokenizer
	train_on_inputs: false
	trust_remote_code: true
	val_set_size: 0.03388084783433621
	wandb_entity: null
	wandb_mode: online
	wandb_name: 97f965f0-6c2c-4001-93f0-b3bd5a572767
	wandb_project: Gradients-On-Demand
	wandb_run: your_name
	wandb_runid: 97f965f0-6c2c-4001-93f0-b3bd5a572767
	warmup_steps: 10
	weight_decay: 0.0
	xformers_attention: null

	```

	</details><br>

	# 19783dba-2611-430a-89e2-4d277105a2fb

	This model is a fine-tuned version of [EleutherAI/gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.9652

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 8
	- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 10
	- training_steps: 9660

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 22.3081 \| 0.0001 \| 1 \| 5.3931 \|
	\| 12.1969 \| 0.0084 \| 150 \| 3.0791 \|
	\| 12.8922 \| 0.0168 \| 300 \| 2.9534 \|
	\| 12.5402 \| 0.0252 \| 450 \| 2.8506 \|
	\| 10.4376 \| 0.0337 \| 600 \| 2.7774 \|
	\| 9.6342 \| 0.0421 \| 750 \| 2.7315 \|
	\| 11.2411 \| 0.0505 \| 900 \| 2.6845 \|
	\| 12.9895 \| 0.0589 \| 1050 \| 2.6467 \|
	\| 12.0386 \| 0.0673 \| 1200 \| 2.6053 \|
	\| 11.1212 \| 0.0757 \| 1350 \| 2.5762 \|
	\| 9.9151 \| 0.0842 \| 1500 \| 2.5369 \|
	\| 11.2688 \| 0.0926 \| 1650 \| 2.5001 \|
	\| 10.0221 \| 0.1010 \| 1800 \| 2.4761 \|
	\| 10.3173 \| 0.1094 \| 1950 \| 2.4375 \|
	\| 10.428 \| 0.1178 \| 2100 \| 2.4349 \|
	\| 7.0369 \| 0.1262 \| 2250 \| 2.3860 \|
	\| 10.4659 \| 0.1347 \| 2400 \| 2.3725 \|
	\| 10.4187 \| 0.1431 \| 2550 \| 2.3579 \|
	\| 6.8085 \| 0.1515 \| 2700 \| 2.3285 \|
	\| 12.6518 \| 0.1599 \| 2850 \| 2.3110 \|
	\| 9.9324 \| 0.1683 \| 3000 \| 2.2927 \|
	\| 7.8111 \| 0.1767 \| 3150 \| 2.2739 \|
	\| 9.2326 \| 0.1852 \| 3300 \| 2.2593 \|
	\| 8.6382 \| 0.1936 \| 3450 \| 2.2342 \|
	\| 8.518 \| 0.2020 \| 3600 \| 2.2290 \|
	\| 6.4198 \| 0.2104 \| 3750 \| 2.2118 \|
	\| 9.0537 \| 0.2188 \| 3900 \| 2.2064 \|
	\| 6.6054 \| 0.2272 \| 4050 \| 2.1808 \|
	\| 8.1502 \| 0.2357 \| 4200 \| 2.1758 \|
	\| 7.229 \| 0.2441 \| 4350 \| 2.1579 \|
	\| 7.0952 \| 0.2525 \| 4500 \| 2.1411 \|
	\| 7.7773 \| 0.2609 \| 4650 \| 2.1294 \|
	\| 9.354 \| 0.2693 \| 4800 \| 2.1157 \|
	\| 9.6896 \| 0.2777 \| 4950 \| 2.1120 \|
	\| 9.817 \| 0.2862 \| 5100 \| 2.0999 \|
	\| 9.7308 \| 0.2946 \| 5250 \| 2.0837 \|
	\| 7.0272 \| 0.3030 \| 5400 \| 2.0796 \|
	\| 9.446 \| 0.3114 \| 5550 \| 2.0694 \|
	\| 9.1402 \| 0.3198 \| 5700 \| 2.0556 \|
	\| 7.8589 \| 0.3282 \| 5850 \| 2.0542 \|
	\| 8.3354 \| 0.3367 \| 6000 \| 2.0445 \|
	\| 8.081 \| 0.3451 \| 6150 \| 2.0343 \|
	\| 6.7192 \| 0.3535 \| 6300 \| 2.0259 \|
	\| 10.2732 \| 0.3619 \| 6450 \| 2.0235 \|
	\| 9.3245 \| 0.3703 \| 6600 \| 2.0137 \|
	\| 8.6904 \| 0.3787 \| 6750 \| 2.0092 \|
	\| 6.4253 \| 0.3872 \| 6900 \| 2.0042 \|
	\| 8.0254 \| 0.3956 \| 7050 \| 1.9975 \|
	\| 10.3048 \| 0.4040 \| 7200 \| 1.9963 \|
	\| 9.2663 \| 0.4124 \| 7350 \| 1.9909 \|
	\| 8.596 \| 0.4208 \| 7500 \| 1.9860 \|
	\| 9.4026 \| 0.4292 \| 7650 \| 1.9820 \|
	\| 7.5361 \| 0.4377 \| 7800 \| 1.9791 \|
	\| 10.1732 \| 0.4461 \| 7950 \| 1.9773 \|
	\| 9.5052 \| 0.4545 \| 8100 \| 1.9737 \|
	\| 9.1775 \| 0.4629 \| 8250 \| 1.9720 \|
	\| 5.179 \| 0.4713 \| 8400 \| 1.9702 \|
	\| 6.0604 \| 0.4797 \| 8550 \| 1.9688 \|
	\| 7.6645 \| 0.4882 \| 8700 \| 1.9676 \|
	\| 6.7768 \| 0.4966 \| 8850 \| 1.9666 \|
	\| 8.6168 \| 0.5050 \| 9000 \| 1.9657 \|
	\| 9.4105 \| 0.5134 \| 9150 \| 1.9658 \|
	\| 8.4106 \| 0.5218 \| 9300 \| 1.9655 \|
	\| 5.5724 \| 0.5302 \| 9450 \| 1.9654 \|
	\| 5.8533 \| 0.5387 \| 9600 \| 1.9652 \|


	### Framework versions

	- PEFT 0.13.2
	- Transformers 4.46.0
	- Pytorch 2.5.0+cu124
	- Datasets 3.0.1
	- Tokenizers 0.20.1