Model Card
Overview
This model is a fine-tuned version of Meta's Llama-2-7B model, specifically trained for hallucination detection.
Task
This model performs token classification, where each token in a sentence is classified as either:
correct
(0): The token is part of factual information.hallucinated
(1): The token is part of hallucinated or incorrect information.
Example Usage:
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load the model and tokenizer
model_name = "nicksnlp/llama-7B-hallucination"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
def infer_with_model(input_text):
# Tokenize the input text
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True, max_length=128)
# Move input tensors to the same device as the model
inputs = {k: v.to(model.device) for k, v in inputs.items()}
# Predict the token labels (hallucination vs. correct)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits # Raw logits output from the model
# Get the predicted labels (0 for correct, 1 for hallucinated)
predicted_labels = torch.argmax(logits, dim=-1)
# Decode the tokens from the input text
tokens = tokenizer.tokenize(input_text)
# Get the corresponding predicted labels for each token
labeled_tokens = list(zip(tokens, predicted_labels[0].tolist()))
# Create a list of hallucinated words
hallucinated_words = [token for token, label in labeled_tokens if label == 1]
return hallucinated_words
# Example input
input_text = "Alexanderplatz is located in London City, it has been there since 1966."
hallucinated_words = infer_with_model(input_text)
print("Hallucinated words:", hallucinated_words)
print(list(tokenizer.decode(tokenizer.convert_tokens_to_ids(word)) for word in hallucinated_words))
Training Data:
The model was trained on a small dataset with labeled examples of correct and hallucinated tokens. A few examples from the dataset:
[
{"text": "The Eiffel Tower is located in Berlin, Germany.", "labels": [0, 0, 0, 0, 0, 0, 1, 1]}, # Hallucinated words: "Berlin", "Germany"
{"text": "The capital of France is Paris.", "labels": [0, 0, 0, 0, 0, 0]}, # Correct sentence
{"text": "The Amazon River flows through Asia.", "labels": [0, 0, 0, 0, 0, 1]} # Hallucinated word: "Asia"
]
Model Details
- Base Model: Llama-2-7B
- Task: Token Classification
- Labels:
- correct (0)
- hallucinated (1)
Training Parameters
- Model Name:
nicksnlp/llama-7B-hallucination
- Base Model:
meta-llama/Llama-2-7b-hf
- Task:
Token Classification
- Batch Size: 8
- Epochs: 1
- Learning Rate: 2e-4
- Optimizer:
paged_adamw_8bit
- Gradient Accumulation Steps: 2
- Max Sequence Length: 128
- Weight Decay: 0.001
- Warmup Ratio: 0.3
- Save Steps: 300
- Logging Steps: 10
- Max Gradient Norm: 0.3
- FP16: false
- BF16: false
- Device: GPU
PEFT Configuration
- LoRA Alpha: 8
- LoRA Dropout: 0.1
- Rank (r): 16
- Bias: "none"
- Target Modules:
["q_proj", "v_proj"]
Training Framework
- LoRA (Low-Rank Adaptation)
Citation
For the original Llama model:
@article{touvron2023llama,
title={LLaMA: Open and Efficient Foundation Language Models},
author={Touvron, Hugo and others},
journal={arXiv preprint arXiv:2302.13971},
year={2023}
}
For the fine-tuned version:
@misc{nicksnlp2024llama_hallucination,
author = {Nikolay Vorontsov},
title = {Fine-tuning Llama-2-7B for Hallucination Detection},
year = {2024},
url = {https://huggingface.co/nicksnlp/llama-7B-hallucination}
}
- Downloads last month
- 14
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for nicksnlp/llama-7B-hallucination
Base model
meta-llama/Llama-2-7b-hf