Update README.md

584eb0e verified 16 days ago

4.04 kB

	---
	license: llama3.1
	datasets:
	- trollek/Danoia-v03
	- trollek/Danoia-v02
	- N8Programs/CreativeGPT
	- Gryphe/Opus-WritingPrompts
	language:
	- da
	- en
	base_model:
	- unsloth/Meta-Llama-3.1-8B-Instruct
	library_name: transformers
	tags:
	- llama-factory
	- unsloth
	---
	# Llama 3.1 8B Danoia

	This model is a fine-tuned version of [unsloth/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct) on the danoia_v03, the opus_writing_instruct, the creativegpt and the danoia_v02_no_system datasets + some private datasets related to evaluation.

	It achieves the following results on the evaluation set:
	- Loss: 0.7108

	## Model description

	This model can write stories in danish and english. It can do much more, I am sure of it, but not more than the vanilla model it is based on.

	## Intended uses & limitations

	Danoia is intended to be private assistant able to write essays, summarise articles, and be a helpful assistant in general. It misspells danish words at times but it is rare though.

	## Training and evaluation data

	I trained this using [LLama-Factory](https://github.com/hiyouga/LLaMA-Factory "LLama Factorys' GitHub") with [unsloth](https://github.com/unslothai/unsloth "unsloths' GitHub") enabled on a 16GB 4060 Ti. It took 30 hours and peaked at 13GB VRAM usage.

	<details>

	<summary>Show LLama-Factory config</summary>

	```yaml
	### model
	model_name_or_path: unsloth/Meta-Llama-3.1-8B-Instruct

	### method
	stage: sft
	do_train: true
	finetuning_type: lora
	lora_target: all
	loraplus_lr_ratio: 16.0
	lora_rank: 16
	lora_alpha: 32
	use_unsloth: true
	use_unsloth_gc: true
	quantization_bit: 4
	upcast_layernorm: true
	seed: 192

	### dataset
	dataset: danoia_v03,opus_writing_instruct,creativegpt,danoia_v02_no_system
	template: llama3
	cutoff_len: 8192
	overwrite_cache: false
	preprocessing_num_workers: 12

	### output
	output_dir: llama31/8b_instruct/loras/danoia
	logging_steps: 1
	save_steps: 500
	save_strategy: steps
	plot_loss: true
	overwrite_output_dir: false

	### train
	per_device_train_batch_size: 2
	gradient_accumulation_steps: 4
	learning_rate: 1.5e-5
	num_train_epochs: 1.5
	lr_scheduler_type: cosine
	warmup_ratio: 0.01
	bf16: true

	## eval
	val_size: 0.01
	per_device_eval_batch_size: 1
	eval_strategy: steps
	eval_steps: 500
	```
	</details>

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1.5e-05
	- train_batch_size: 2
	- eval_batch_size: 1
	- seed: 192
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 8
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.01
	- num_epochs: 1.5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|
	\| 0.2352 \| 0.0719 \| 500 \| 0.8450 \|
	\| 0.1742 \| 0.1438 \| 1000 \| 0.8090 \|
	\| 0.1667 \| 0.2156 \| 1500 \| 0.7889 \|
	\| 0.3791 \| 0.2875 \| 2000 \| 0.7750 \|
	\| 0.1989 \| 0.3594 \| 2500 \| 0.7665 \|
	\| 0.2347 \| 0.4313 \| 3000 \| 0.7563 \|
	\| 0.1694 \| 0.5032 \| 3500 \| 0.7498 \|
	\| 0.2351 \| 0.5750 \| 4000 \| 0.7412 \|
	\| 0.2322 \| 0.6469 \| 4500 \| 0.7363 \|
	\| 0.1689 \| 0.7188 \| 5000 \| 0.7298 \|
	\| 0.1953 \| 0.7907 \| 5500 \| 0.7250 \|
	\| 0.2099 \| 0.8626 \| 6000 \| 0.7214 \|
	\| 0.2368 \| 0.9344 \| 6500 \| 0.7166 \|
	\| 0.1632 \| 1.0063 \| 7000 \| 0.7151 \|
	\| 0.1558 \| 1.0782 \| 7500 \| 0.7157 \|
	\| 0.2854 \| 1.1501 \| 8000 \| 0.7139 \|
	\| 0.199 \| 1.2220 \| 8500 \| 0.7127 \|
	\| 0.1606 \| 1.2938 \| 9000 \| 0.7117 \|
	\| 0.1788 \| 1.3657 \| 9500 \| 0.7112 \|
	\| 0.2618 \| 1.4376 \| 10000 \| 0.7109 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.46.1
	- Pytorch 2.5.1
	- Datasets 3.1.0
	- Tokenizers 0.20.3