EvolCodeLlama-7b / README.md

End of training

a7c6491 verified about 1 year ago

7.96 kB

	---
	license: llama2
	library_name: peft
	tags:
	- axolotl
	- generated_from_trainer
	base_model: codellama/CodeLlama-7b-hf
	model-index:
	- name: EvolCodeLlama-7b
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.0`
	```yaml
	base_model: codellama/CodeLlama-7b-hf
	base_model_config: codellama/CodeLlama-7b-hf
	model_type: LlamaForCausalLM
	tokenizer_type: LlamaTokenizer
	is_llama_derived_model: true
	hub_model_id: EvolCodeLlama-7b

	load_in_8bit: false
	load_in_4bit: true
	strict: false

	datasets:
	- path: mlabonne/Evol-Instruct-Python-1k
	type: alpaca
	dataset_prepared_path: last_run_prepared
	val_set_size: 0.02
	output_dir: ./qlora-out

	adapter: qlora
	lora_model_dir:

	sequence_len: 2048
	sample_packing: true

	lora_r: 32
	lora_alpha: 16
	lora_dropout: 0.05
	lora_target_modules:
	lora_target_linear: true
	lora_fan_in_fan_out:

	wandb_project: FTCodeLlama-2
	wandb_entity:
	wandb_watch:
	wandb_run_id:
	wandb_log_model:

	gradient_accumulation_steps: 2
	micro_batch_size: 4
	num_epochs: 3
	optimizer: paged_adamw_32bit
	lr_scheduler: cosine
	learning_rate: 0.0002

	train_on_inputs: false
	group_by_length: false
	bf16: true
	fp16: false
	tf32: false

	gradient_checkpointing: true
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_steps: 100
	eval_steps: 0.01
	save_strategy: epoch
	save_steps:
	debug:
	deepspeed:
	weight_decay: 0.0
	fsdp:
	fsdp_config:
	special_tokens:
	bos_token: "<s>"
	eos_token: "</s>"
	unk_token: "<unk>"

	```

	</details><br>

	# EvolCodeLlama-7b

	This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.3754

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 0.3686 \| 0.01 \| 1 \| 0.5015 \|
	\| 0.4397 \| 0.03 \| 3 \| 0.5013 \|
	\| 0.4919 \| 0.06 \| 6 \| 0.5013 \|
	\| 0.3191 \| 0.09 \| 9 \| 0.5011 \|
	\| 0.2514 \| 0.12 \| 12 \| 0.5003 \|
	\| 0.3379 \| 0.15 \| 15 \| 0.4992 \|
	\| 0.4712 \| 0.19 \| 18 \| 0.4969 \|
	\| 0.3801 \| 0.22 \| 21 \| 0.4922 \|
	\| 0.3482 \| 0.25 \| 24 \| 0.4833 \|
	\| 0.4113 \| 0.28 \| 27 \| 0.4702 \|
	\| 0.2524 \| 0.31 \| 30 \| 0.4552 \|
	\| 0.2641 \| 0.34 \| 33 \| 0.4415 \|
	\| 0.3554 \| 0.37 \| 36 \| 0.4302 \|
	\| 0.2384 \| 0.4 \| 39 \| 0.4213 \|
	\| 0.2131 \| 0.43 \| 42 \| 0.4153 \|
	\| 0.2308 \| 0.46 \| 45 \| 0.4105 \|
	\| 0.3478 \| 0.49 \| 48 \| 0.4053 \|
	\| 0.2913 \| 0.53 \| 51 \| 0.4003 \|
	\| 0.2909 \| 0.56 \| 54 \| 0.3956 \|
	\| 0.2032 \| 0.59 \| 57 \| 0.3928 \|
	\| 0.2479 \| 0.62 \| 60 \| 0.3906 \|
	\| 0.2145 \| 0.65 \| 63 \| 0.3890 \|
	\| 0.2447 \| 0.68 \| 66 \| 0.3882 \|
	\| 0.2928 \| 0.71 \| 69 \| 0.3876 \|
	\| 0.384 \| 0.74 \| 72 \| 0.3854 \|
	\| 0.1751 \| 0.77 \| 75 \| 0.3835 \|
	\| 0.352 \| 0.8 \| 78 \| 0.3818 \|
	\| 0.2443 \| 0.84 \| 81 \| 0.3808 \|
	\| 0.3211 \| 0.87 \| 84 \| 0.3798 \|
	\| 0.3026 \| 0.9 \| 87 \| 0.3788 \|
	\| 0.2357 \| 0.93 \| 90 \| 0.3776 \|
	\| 0.2661 \| 0.96 \| 93 \| 0.3755 \|
	\| 0.3314 \| 0.99 \| 96 \| 0.3751 \|
	\| 0.2789 \| 1.02 \| 99 \| 0.3742 \|
	\| 0.1734 \| 1.03 \| 102 \| 0.3744 \|
	\| 0.1928 \| 1.06 \| 105 \| 0.3761 \|
	\| 0.2681 \| 1.09 \| 108 \| 0.3753 \|
	\| 0.4148 \| 1.12 \| 111 \| 0.3750 \|
	\| 0.1977 \| 1.15 \| 114 \| 0.3744 \|
	\| 0.1977 \| 1.19 \| 117 \| 0.3740 \|
	\| 0.2499 \| 1.22 \| 120 \| 0.3742 \|
	\| 0.2192 \| 1.25 \| 123 \| 0.3730 \|
	\| 0.2207 \| 1.28 \| 126 \| 0.3723 \|
	\| 0.2179 \| 1.31 \| 129 \| 0.3718 \|
	\| 0.2843 \| 1.34 \| 132 \| 0.3734 \|
	\| 0.2614 \| 1.37 \| 135 \| 0.3721 \|
	\| 0.2033 \| 1.4 \| 138 \| 0.3705 \|
	\| 0.212 \| 1.43 \| 141 \| 0.3705 \|
	\| 0.2307 \| 1.46 \| 144 \| 0.3712 \|
	\| 0.3182 \| 1.49 \| 147 \| 0.3698 \|
	\| 0.2467 \| 1.53 \| 150 \| 0.3664 \|
	\| 0.1909 \| 1.56 \| 153 \| 0.3665 \|
	\| 0.3286 \| 1.59 \| 156 \| 0.3655 \|
	\| 0.2195 \| 1.62 \| 159 \| 0.3648 \|
	\| 0.3231 \| 1.65 \| 162 \| 0.3650 \|
	\| 0.2922 \| 1.68 \| 165 \| 0.3631 \|
	\| 0.1956 \| 1.71 \| 168 \| 0.3627 \|
	\| 0.2299 \| 1.74 \| 171 \| 0.3639 \|
	\| 0.1585 \| 1.77 \| 174 \| 0.3649 \|
	\| 0.2289 \| 1.8 \| 177 \| 0.3650 \|
	\| 0.189 \| 1.84 \| 180 \| 0.3643 \|
	\| 0.2736 \| 1.87 \| 183 \| 0.3628 \|
	\| 0.3591 \| 1.9 \| 186 \| 0.3614 \|
	\| 0.3181 \| 1.93 \| 189 \| 0.3612 \|
	\| 0.1994 \| 1.96 \| 192 \| 0.3612 \|
	\| 0.2499 \| 1.99 \| 195 \| 0.3618 \|
	\| 0.1659 \| 2.01 \| 198 \| 0.3627 \|
	\| 0.231 \| 2.04 \| 201 \| 0.3665 \|
	\| 0.169 \| 2.07 \| 204 \| 0.3744 \|
	\| 0.2082 \| 2.1 \| 207 \| 0.3800 \|
	\| 0.1755 \| 2.13 \| 210 \| 0.3770 \|
	\| 0.1959 \| 2.16 \| 213 \| 0.3721 \|
	\| 0.1933 \| 2.19 \| 216 \| 0.3705 \|
	\| 0.1213 \| 2.22 \| 219 \| 0.3712 \|
	\| 0.237 \| 2.25 \| 222 \| 0.3738 \|
	\| 0.2277 \| 2.28 \| 225 \| 0.3771 \|
	\| 0.2832 \| 2.31 \| 228 \| 0.3789 \|
	\| 0.2039 \| 2.35 \| 231 \| 0.3783 \|
	\| 0.2302 \| 2.38 \| 234 \| 0.3764 \|
	\| 0.1562 \| 2.41 \| 237 \| 0.3750 \|
	\| 0.1688 \| 2.44 \| 240 \| 0.3742 \|
	\| 0.126 \| 2.47 \| 243 \| 0.3741 \|
	\| 0.1846 \| 2.5 \| 246 \| 0.3746 \|
	\| 0.2195 \| 2.53 \| 249 \| 0.3745 \|
	\| 0.2335 \| 2.56 \| 252 \| 0.3749 \|
	\| 0.1542 \| 2.59 \| 255 \| 0.3750 \|
	\| 0.1783 \| 2.62 \| 258 \| 0.3755 \|
	\| 0.2409 \| 2.65 \| 261 \| 0.3762 \|
	\| 0.1777 \| 2.69 \| 264 \| 0.3762 \|
	\| 0.2591 \| 2.72 \| 267 \| 0.3761 \|
	\| 0.2092 \| 2.75 \| 270 \| 0.3757 \|
	\| 0.2256 \| 2.78 \| 273 \| 0.3757 \|
	\| 0.1923 \| 2.81 \| 276 \| 0.3756 \|
	\| 0.156 \| 2.84 \| 279 \| 0.3755 \|
	\| 0.2036 \| 2.87 \| 282 \| 0.3754 \|
	\| 0.2254 \| 2.9 \| 285 \| 0.3753 \|
	\| 0.1683 \| 2.93 \| 288 \| 0.3753 \|
	\| 0.1528 \| 2.96 \| 291 \| 0.3754 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.38.0.dev0
	- Pytorch 2.1.2+cu118
	- Datasets 2.17.0
	- Tokenizers 0.15.0