Llama-Gemma-2-27b-SFT-trial1 / README.md

Update README.md

c0c20f5 verified 2 months ago

7.96 kB

	---
	library_name: transformers
	license:
	- llama3.1
	- gemma
	base_model: google/gemma-2-27b
	tags:
	- axolotl
	- generated_from_trainer
	---

	# Llama-Gemma-2-27b-SFT-trial1

	## 概要

	[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)を教師あり学習によりInstruction Tuningしたモデルです。

	[松尾研大規模言語モデル講座2024](https://weblab.t.u-tokyo.ac.jp/lecture/course-list/large-language-model/)のコンペ用の提出モデル作成の一環として作成・公開しています。

	This model is built with Llama and Qwen.

	## 使用データセット

	- [Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered](https://huggingface.co/datasets/Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered)
	- [Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted](https://huggingface.co/datasets/Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted)
	- [Aratako/magpie-reasoning-llama-nemotron-70b-100k-filtered](https://huggingface.co/datasets/Aratako/magpie-reasoning-llama-nemotron-70b-100k-filtered)
	- [Aratako/Open-Platypus-Japanese-masked-formatted](https://huggingface.co/datasets/Aratako/Open-Platypus-Japanese-masked-formatted)
	- [kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja](https://huggingface.co/datasets/kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja)
	- [kanhatakeyama/ramdom-to-fixed-multiturn-Calm3](https://huggingface.co/datasets/kanhatakeyama/ramdom-to-fixed-multiturn-Calm3)
	- [Aratako/magpie-ultra-v0.1-formatted](https://huggingface.co/datasets/Aratako/magpie-ultra-v0.1-formatted)
	- [Aratako/orca-agentinstruct-1M-v1-selected](https://huggingface.co/datasets/Aratako/orca-agentinstruct-1M-v1-selected)
	- [Aratako/Synthetic-JP-EN-Coding-Dataset-801k-50k](https://huggingface.co/datasets/Aratako/Synthetic-JP-EN-Coding-Dataset-801k-50k)

	## ライセンス

	本モデルは学習に利用したデータの関係で以下のライセンスの影響を受けます。

	- [META LLAMA 3.1 COMMUNITY LICENSE](https://www.llama.com/llama3_1/license/)を継承します。
	- [Gemma Terms of Use](https://ai.google.dev/gemma/terms)を継承します。
	- [Qwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE)の影響を受けます。ライセンスは継承しませんが、「Built with Qwen」のような文言を記載する必要があります。

	## 学習に関する詳細

	本モデルの学習には[axolotl](https://github.com/axolotl-ai-cloud/axolotl)を使いました。パラメータ等の学習の設定は下記の自動生成された記述をご確認ください。

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.5.2`
	```yaml
	base_model: google/gemma-2-27b
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer

	hub_model_id: Aratako/fft-1
	hub_strategy: "end"
	push_dataset_to_hub:
	hf_use_auth_token: true

	plugins:
	- axolotl.integrations.liger.LigerPlugin
	liger_cross_entropy: false
	liger_rope: true
	liger_rms_norm: true
	liger_swiglu: true
	liger_fused_linear_cross_entropy: true

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	chat_template: gemma

	datasets:
	- path: Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered
	type: chat_template
	field_messages: messages
	message_field_role: role
	message_field_content: content
	- path: Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted
	type: chat_template
	field_messages: conversations
	message_field_role: role
	message_field_content: content
	- path: Aratako/magpie-reasoning-llama-nemotron-70b-100k-filtered
	type: chat_template
	field_messages: conversations
	message_field_role: role
	message_field_content: content
	- path: Aratako/Open-Platypus-Japanese-masked-formatted
	type: chat_template
	field_messages: conversations
	message_field_role: role
	message_field_content: content
	- path: kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja
	type: chat_template
	field_messages: messages
	message_field_role: role
	message_field_content: content
	- path: kanhatakeyama/ramdom-to-fixed-multiturn-Calm3
	split: 20240806filtered
	type: chat_template
	field_messages: messages
	message_field_role: role
	message_field_content: content
	- path: Aratako/magpie-ultra-v0.1-formatted
	type: chat_template
	field_messages: conversations
	message_field_role: role
	message_field_content: content
	- path: Aratako/orca-agentinstruct-1M-v1-selected
	type: chat_template
	field_messages: messages
	message_field_role: role
	message_field_content: content
	- path: Aratako/Synthetic-JP-EN-Coding-Dataset-801k-50k
	type: chat_template
	field_messages: messages
	message_field_role: role
	message_field_content: content

	shuffle_merged_datasets: true
	dataset_prepared_path: /workspace/data/fft-data
	val_set_size: 0.003
	output_dir: /workspace/data/27b-fft-out-1

	sequence_len: 4096
	sample_packing: true
	eval_sample_packing: false
	pad_to_sequence_len: true

	adapter:
	lora_model_dir:
	lora_r:
	lora_alpha:
	lora_dropout:
	lora_target_linear:
	lora_fan_in_fan_out:

	wandb_project: 27b-fft
	wandb_entity: aratako-lm
	wandb_watch:
	wandb_name: attempt-01
	wandb_log_model:

	gradient_accumulation_steps: 4
	micro_batch_size: 8
	num_epochs: 2
	optimizer: paged_adamw_8bit
	lr_scheduler:
	cosine_min_lr_ratio: 0.1
	learning_rate: 0.00001

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: false

	gradient_checkpointing: true
	early_stopping_patience:
	auto_resume_from_checkpoints: true
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	save_strategy: steps
	save_steps: 100
	save_total_limit: 2

	warmup_steps: 10
	eval_steps: 100
	eval_batch_size: 1
	eval_table_size:
	eval_max_new_tokens:
	debug:
	deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
	weight_decay: 0.01
	fsdp:
	fsdp_config:
	special_tokens:
	pad_token: <pad>

	```

	</details><br>

	# fft-1

	This model is a fine-tuned version of [google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6122

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 8
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 7
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 224
	- total_eval_batch_size: 7
	- optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 10
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.9427 \| 0.0020 \| 1 \| 0.9940 \|
	\| 0.6566 \| 0.2043 \| 100 \| 0.6648 \|
	\| 0.6609 \| 0.4086 \| 200 \| 0.6430 \|
	\| 0.6457 \| 0.6129 \| 300 \| 0.6306 \|
	\| 0.6322 \| 0.8172 \| 400 \| 0.6203 \|
	\| 0.5082 \| 1.0204 \| 500 \| 0.6238 \|
	\| 0.5348 \| 1.2247 \| 600 \| 0.6212 \|
	\| 0.5253 \| 1.4290 \| 700 \| 0.6181 \|
	\| 0.5136 \| 1.6333 \| 800 \| 0.6147 \|
	\| 0.5125 \| 1.8376 \| 900 \| 0.6122 \|


	### Framework versions

	- Transformers 4.46.3
	- Pytorch 2.3.1+cu121
	- Datasets 3.1.0
	- Tokenizers 0.20.3