Model Card

Overview

This model is a fine-tuned version of Meta's Llama-2-7B model, specifically trained for hallucination detection.

Task

This model performs token classification, where each token in a sentence is classified as either:

correct (0): The token is part of factual information.
hallucinated (1): The token is part of hallucinated or incorrect information.

Example Usage:

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load the model and tokenizer
model_name = "nicksnlp/llama-7B-hallucination"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

def infer_with_model(input_text):
    # Tokenize the input text
    inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True, max_length=128)

    # Move input tensors to the same device as the model
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    # Predict the token labels (hallucination vs. correct)
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits  # Raw logits output from the model

    # Get the predicted labels (0 for correct, 1 for hallucinated)
    predicted_labels = torch.argmax(logits, dim=-1)

    # Decode the tokens from the input text
    tokens = tokenizer.tokenize(input_text)

    # Get the corresponding predicted labels for each token
    labeled_tokens = list(zip(tokens, predicted_labels[0].tolist()))

    # Create a list of hallucinated words
    hallucinated_words = [token for token, label in labeled_tokens if label == 1]

    return hallucinated_words


# Example input
input_text = "Alexanderplatz is located in London City, it has been there since 1966."
hallucinated_words = infer_with_model(input_text)

print("Hallucinated words:", hallucinated_words)
print(list(tokenizer.decode(tokenizer.convert_tokens_to_ids(word)) for word in hallucinated_words))

Training Data:

The model was trained on a small dataset with labeled examples of correct and hallucinated tokens. A few examples from the dataset:

[
  {"text": "The Eiffel Tower is located in Berlin, Germany.", "labels": [0, 0, 0, 0, 0, 0, 1, 1]},  # Hallucinated words: "Berlin", "Germany"
  {"text": "The capital of France is Paris.", "labels": [0, 0, 0, 0, 0, 0]},  # Correct sentence
  {"text": "The Amazon River flows through Asia.", "labels": [0, 0, 0, 0, 0, 1]}  # Hallucinated word: "Asia"
]

Model Details

Base Model: Llama-2-7B
Task: Token Classification
Labels:
- correct (0)
- hallucinated (1)

Training Parameters

Model Name: nicksnlp/llama-7B-hallucination
Base Model: meta-llama/Llama-2-7b-hf
Task: Token Classification
Batch Size: 8
Epochs: 1
Learning Rate: 2e-4
Optimizer: paged_adamw_8bit
Gradient Accumulation Steps: 2
Max Sequence Length: 128
Weight Decay: 0.001
Warmup Ratio: 0.3
Save Steps: 300
Logging Steps: 10
Max Gradient Norm: 0.3
FP16: false
BF16: false
Device: GPU

PEFT Configuration

LoRA Alpha: 8
LoRA Dropout: 0.1
Rank (r): 16
Bias: "none"
Target Modules: ["q_proj", "v_proj"]

Training Framework

LoRA (Low-Rank Adaptation)

Citation

For the original Llama model:

@article{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and others},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023}
}

For the fine-tuned version:

@misc{nicksnlp2024llama_hallucination,
  author = {Nikolay Vorontsov},
  title = {Fine-tuning Llama-2-7B for Hallucination Detection},
  year = {2024},
  url = {https://huggingface.co/nicksnlp/llama-7B-hallucination}
}

nicksnlp
/

llama-7B-hallucination

You need to agree to share your contact information to access this model