phi2-bunny / README.md

Update README.md

8ace388 verified 9 months ago

4.93 kB

	---
	license: mit
	library_name: transformers
	tags:
	- axolotl
	- generated_from_trainer
	base_model: microsoft/phi-2
	model-index:
	- name: phi2-bunny
	results: []
	datasets:
	- WhiteRabbitNeo/WRN-Chapter-1
	- WhiteRabbitNeo/WRN-Chapter-2
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.0`
	```yaml
	base_model: microsoft/phi-2
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer
	is_llama_derived_model: false
	# trust_remote_code: true

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	datasets:
	- path: WhiteRabbitNeo/WRN-Chapter-1
	type:
	system_prompt: ""
	field_system: system
	field_instruction: instruction
	field_output: response
	prompt_style: chatml
	- path: WhiteRabbitNeo/WRN-Chapter-2
	type:
	system_prompt: ""
	field_system: system
	field_instruction: instruction
	field_output: response
	prompt_style: chatml

	dataset_prepared_path: ./phi2-bunny/last-run-prepared
	val_set_size: 0.05
	output_dir: ./phi2-bunny/

	sequence_len: 2048
	sample_packing: true
	pad_to_sequence_len: true

	adapter: lora
	lora_model_dir:
	lora_r: 64
	lora_alpha: 32
	lora_dropout: 0.05
	lora_target_linear: true
	lora_fan_in_fan_out:
	lora_modules_to_save:
	- embed_tokens
	- lm_head


	hub_model_id: justinj92/phi2-bunny

	wandb_project: phi2-bunny
	wandb_entity: justinjoy-5
	wandb_watch:
	wandb_name:
	wandb_log_model:

	gradient_accumulation_steps: 8
	micro_batch_size: 2
	num_epochs: 5
	optimizer: paged_adamw_8bit
	adam_beta1: 0.9
	adam_beta2: 0.999
	adam_epsilon: 0.00001
	max_grad_norm: 1000.0
	lr_scheduler: cosine
	learning_rate: 0.0002

	train_on_inputs: false
	group_by_length: true
	bf16: true
	fp16: false
	tf32: true

	gradient_checkpointing: true
	early_stopping_patience:
	resume_from_checkpoint:
	auto_resume_from_checkpoints:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true
	chat_template: chatml

	warmup_steps: 100
	evals_per_epoch: 4
	save_steps: 0.01
	save_total_limit: 2
	debug:
	deepspeed:
	weight_decay: 0.01
	fsdp:
	fsdp_config:
	resize_token_embeddings_to_32x: true
	special_tokens:
	eos_token: "<\|im_end\|>"
	pad_token: "<\|endoftext\|>"
	tokens:
	- "<\|im_start\|>"

	```

	</details><br>

	## Hardware

	Azure 1xNC_H100 VM - 8 Hours Training Time

	# phi2-bunny

	This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the WhiteRabbit Cybersecurity dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.5347

	## Model description

	Phi-2 SLM

	## Intended uses & limitations

	Research & Learning

	## ChatML Prompt

	<\|im_start\|>system
	You are Bunny, a helpful AI cyber researcher. Answer the Question in a logical, step-by-step manner that makes the reasoning process clear. Carefully analyze the question to identify the core issue or problem to be solved.<\|im_end\|>
	<\|im_start\|>user
	{prompt}<\|im_end\|>
	<\|im_start\|>assistant


	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-05
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 0.8645 \| 0.0 \| 1 \| 0.7932 \|
	\| 0.6246 \| 0.25 \| 228 \| 0.6771 \|
	\| 0.6449 \| 0.5 \| 456 \| 0.6186 \|
	\| 0.6658 \| 0.75 \| 684 \| 0.6073 \|
	\| 0.5419 \| 1.0 \| 912 \| 0.5911 \|
	\| 0.5477 \| 1.24 \| 1140 \| 0.5878 \|
	\| 0.612 \| 1.49 \| 1368 \| 0.5715 \|
	\| 0.6328 \| 1.74 \| 1596 \| 0.5632 \|
	\| 0.5082 \| 1.99 \| 1824 \| 0.5534 \|
	\| 0.5807 \| 2.24 \| 2052 \| 0.5513 \|
	\| 0.4775 \| 2.49 \| 2280 \| 0.5448 \|
	\| 0.514 \| 2.74 \| 2508 \| 0.5430 \|
	\| 0.4943 \| 2.99 \| 2736 \| 0.5398 \|
	\| 0.5012 \| 3.22 \| 2964 \| 0.5396 \|
	\| 0.5203 \| 3.48 \| 3192 \| 0.5371 \|
	\| 0.5112 \| 3.73 \| 3420 \| 0.5356 \|
	\| 0.4978 \| 3.98 \| 3648 \| 0.5351 \|
	\| 0.5642 \| 4.22 \| 3876 \| 0.5348 \|
	\| 0.5383 \| 4.47 \| 4104 \| 0.5348 \|
	\| 0.4679 \| 4.72 \| 4332 \| 0.5347 \|


	### Framework versions

	- PEFT 0.8.1.dev0
	- Transformers 4.37.0
	- Pytorch 2.1.2+cu121
	- Datasets 2.16.1
	- Tokenizers 0.15.0