llama-3-8B-semeval2014 / README.md

Update README.md

0f21bc0 verified about 1 year ago

5.85 kB

	---
	license: other
	library_name: peft
	tags:
	- axolotl
	- generated_from_trainer
	base_model: NousResearch/Meta-Llama-3-8B
	model-index:
	- name: llama-3-8B-semeval2014
	results: []
	language:
	- en
	metrics:
	- f1
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.0`
	```yaml
	base_model: NousResearch/Meta-Llama-3-8B

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	datasets:
	- path: semeval2014_train.jsonl
	ds_type: json
	type:
	# JSONL file contains instruction, input, output fields per line.
	# This gets mapped to the equivalent axolotl tags.
	field_instruction: instruction
	field_input: input
	field_output: output
	# Format is used by axolotl to generate the prompt.
	format: \|-
	[INST] {input} [/INST]

	tokens: # add new control tokens from the dataset to the model
	- "[INST]"
	- "[/INST]"

	dataset_prepared_path:
	val_set_size: 0.05
	output_dir: ./lora-out

	sequence_len: 4096
	sample_packing: false
	eval_sample_packing: false
	pad_to_sequence_len: false

	adapter: lora
	lora_model_dir:
	lora_r: 16
	lora_alpha: 32
	lora_dropout: 0.05
	lora_target_linear: true
	lora_fan_in_fan_out:
	lora_modules_to_save: # required when adding new tokens to LLaMA/Mistral
	- embed_tokens
	- lm_head

	wandb_project: absa-semeval2014
	wandb_entity: psimm
	wandb_log_model:
	wandb_name: llama-3-8B-semeval2014

	hub_model_id: psimm/llama-3-8B-semeval2014

	gradient_accumulation_steps: 1
	micro_batch_size: 32
	num_epochs: 4
	optimizer: adamw_torch
	lr_scheduler: cosine
	learning_rate: 0.0001

	train_on_inputs: false
	group_by_length: false
	bf16: true
	fp16: false
	tf32: false

	gradient_checkpointing: true
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_steps: 10
	eval_steps: 0.05
	eval_table_size:
	eval_table_max_new_tokens: 128
	save_steps:
	debug:
	deepspeed:
	weight_decay: 0.0
	fsdp:
	fsdp_config:
	special_tokens:
	pad_token: <\|end_of_text\|>

	```

	</details><br>

	# llama-3-8B-semeval2014

	This model is a fine-tuned version of [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) on the SemEval2014 Task 4 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0695
	- F1 Score: 82.13

	For more details, see my [article](https://simmering.dev/open-absa)

	## Intended uses & limitations

	Aspect-based sentiment analysis in English. Pass it review sentences wrapped in tags, like this: [INST]The cheeseburger was tasty but the fries were soggy.[/INST]

	## How to run

	This adapter requires that two new tokens are added to the tokenizer. The tokens are: "[INST]" and "[/INST]". Also, the base model's embedding layer size has to be increased by 2.

	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer

	extra_tokens = ["[INST]", "[/INST]"]
	base_model = "NousResearch/Meta-Llama-3-8B"

	base_model = AutoModelForCausalLM.from_pretrained("NousResearch/Meta-Llama-3-8B")
	base_model.resize_token_embeddings(base_model.config.vocab_size + len(extra_tokens))

	tokenizer = AutoTokenizer.from_pretrained("NousResearch/Meta-Llama-3-8B")

	tokenizer.add_special_tokens({"additional_special_tokens": extra_tokens})

	model = PeftModel.from_pretrained(base_model, "psimm/llama-3-8B-semeval2014")

	input_text = "[INST]The food was tasty[/INST]"
	input_ids = tokenizer(input_text, return_tensors="pt").input_ids

	gen_tokens = model.generate(
	input_ids,
	max_length=256,
	temperature=0.01,
	)

	# Remove the input tokens
	output_tokens = gen_tokens[:, input_ids.shape[1] :]

	print(tokenizer.batch_decode(output_tokens, skip_special_tokens=True))
	```

	## Training and evaluation data

	SemEval 2014 Task 4 reviews.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- total_train_batch_size: 64
	- total_eval_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 10
	- num_epochs: 4

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 2.5408 \| 0.0112 \| 1 \| 2.2742 \|
	\| 0.1159 \| 0.2022 \| 18 \| 0.1026 \|
	\| 0.1028 \| 0.4045 \| 36 \| 0.0762 \|
	\| 0.0813 \| 0.6067 \| 54 \| 0.0709 \|
	\| 0.0908 \| 0.8090 \| 72 \| 0.0665 \|
	\| 0.0431 \| 1.0112 \| 90 \| 0.0639 \|
	\| 0.0275 \| 1.2135 \| 108 \| 0.0663 \|
	\| 0.0224 \| 1.4157 \| 126 \| 0.0659 \|
	\| 0.0349 \| 1.6180 \| 144 \| 0.0637 \|
	\| 0.0281 \| 1.8202 \| 162 \| 0.0589 \|
	\| 0.0125 \| 2.0225 \| 180 \| 0.0592 \|
	\| 0.0088 \| 2.2247 \| 198 \| 0.0682 \|
	\| 0.0076 \| 2.4270 \| 216 \| 0.0666 \|
	\| 0.01 \| 2.6292 \| 234 \| 0.0654 \|
	\| 0.0131 \| 2.8315 \| 252 \| 0.0704 \|
	\| 0.0075 \| 3.0337 \| 270 \| 0.0679 \|
	\| 0.002 \| 3.2360 \| 288 \| 0.0688 \|
	\| 0.0029 \| 3.4382 \| 306 \| 0.0692 \|
	\| 0.0009 \| 3.6404 \| 324 \| 0.0694 \|
	\| 0.0064 \| 3.8427 \| 342 \| 0.0695 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.40.2
	- Pytorch 2.2.2+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1