Edit model card

Llama-2-7b-hf-IDMGSP

This model is a LoRA adapter of meta-llama/Llama-2-7b-hf on the tum-nlp/IDMGSP dataset. It achieves the following results on the evaluation split:

  • Loss: 0.1450
  • Accuracy: {'accuracy': 0.9759036144578314}
  • F1: {'f1': 0.9758125472411187}

Model description

Model loaded fine-tuned in 4bit quantization mode using LoRA.

Intended uses & limitations

Labels: 0 non-AI generated, 1 AI generated.

For classifying AI generated text. Code to run the inference

import transformers
import torch
import datasets
import numpy as np
import torch
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, PeftModel, AutoPeftModelForCausalLM, TaskType
import bitsandbytes as bnb

class Model():
    def __init__(self, name) -> None:
        # Tokenizer
        self.tokenizer = transformers.LlamaTokenizer.from_pretrained(self.name)
        self.tokenizer.pad_token = self.tokenizer.eos_token
        print(f"Tokenizer: {self.tokenizer.eos_token}; Pad {self.tokenizer.pad_token}")

        # Model
        bnb_config = transformers.BitsAndBytesConfig(
            load_in_4bit = True,
            bnb_4bit_use_double_quant = True,
            bnb_4bit_quant_type = "nf4",
            bnb_4bit_compute_dtype = "bfloat16",
        )
        self.peft_config = LoraConfig(
            task_type=TaskType.SEQ_CLS, r=8, lora_alpha=16, lora_dropout=0.05, bias="none"
        )
        self.model = transformers.LlamaForSequenceClassification.from_pretrained(self.name, 
            num_labels=2,
            quantization_config = bnb_config,
            device_map = "auto"
            )
        self.model.config.pad_token_id = self.model.config.eos_token_id

    def predict(self, text):
        inputs = self.tokenize(text)
        outputs = self.model(**inputs)
        logits = outputs.logits
        predictions = torch.argmax(logits, dim=-1)
        return id2label[predictions.item()]

Training and evaluation data

tum-nlp/IDMGSP dataset, classifier_input subsplit.

Training procedure

Training hyperparameters

BitsAndBytes and LoRA config parameters:

image/png

GPU VRAM Consumption during fine-tuning: 30.6gb

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 5
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy F1
0.0766 1.0 498 0.1165 {'accuracy': 0.9614708835341366} {'f1': 0.9612813721780804}
0.182 2.0 996 0.0934 {'accuracy': 0.9657379518072289} {'f1': 0.9648059816939539}
0.037 3.0 1494 0.1190 {'accuracy': 0.9716365461847389} {'f1': 0.9710182097973841}
0.0349 4.0 1992 0.1884 {'accuracy': 0.96875} {'f1': 0.9692326702088224}
0.0046 5.0 2490 0.1450 {'accuracy': 0.9759036144578314} {'f1': 0.9758125472411187}

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.0.1
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
420
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ernlavr/Llama-2-7b-hf-IDMGSP