|
--- |
|
base_model: meta-llama/Llama-3.1-8B-Instruct |
|
library_name: transformers |
|
model_name: judgelm_llama_31_8b_toxic_ckpt_ep2 |
|
tags: |
|
- generated_from_trainer |
|
- trl |
|
- sft |
|
licence: license |
|
language: |
|
- am |
|
- ar |
|
- de |
|
- en |
|
- es |
|
- hi |
|
- ru |
|
- uk |
|
- zh |
|
license: apache-2.0 |
|
datasets: |
|
- textdetox/detoxification_pairwise_style_evaluation |
|
--- |
|
|
|
|
|
This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct). |
|
It has been trained using [TRL](https://github.com/huggingface/trl) with [textdetox/detoxification_pairwise_style_evaluation](https://huggingface.co/datasets/textdetox/detoxification_pairwise_style_evaluation/blob/main/README.md) dataset |
|
|
|
## Quick start |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
from peft import PeftModel |
|
import torch |
|
|
|
# Load base model and tokenizer |
|
base_model = "meta-llama/Meta-Llama-3-8B-Instruct" |
|
model = AutoModelForCausalLM.from_pretrained( |
|
base_model, |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16, |
|
token=<your-token> |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(base_model) |
|
|
|
# Load LoRA adapter |
|
lora_model = "textdetox/Llama-pairwise-toxicity-evaluator" |
|
model = PeftModel.from_pretrained(model, lora_model) |
|
|
|
# Set to eval mode |
|
model.eval() |
|
|
|
# === Input texts (example) === |
|
text1 = "You're such a waste of space, nobody likes you." |
|
text2 = "I don't agree with you, but I respect your opinion." |
|
|
|
# === Prompt === |
|
prompt = f"""You are given with two texts. Your task is to compare which text is more toxic. |
|
|
|
Options: |
|
|
|
- text1 (text1 is more toxic than text2) |
|
- text2 (text2 is more toxic than text1) |
|
- None (both texts have equal toxicity - both of them are either toxic or non-toxic at the same time) |
|
|
|
Examples: |
|
text1: "I hate you, you are the worst person ever." |
|
text2: "I love you, you are the best person ever." |
|
Answer: text1 |
|
|
|
text1: "You are so smart, I can't believe you did that." |
|
text2: "You are so stupid, I can't believe you did that." |
|
Answer: text2 |
|
|
|
text1: "I think you are a great person." |
|
text2: "I think you are a very good man" |
|
Answer: none |
|
|
|
You must return ONLY one of the options. Do not add any explanations or additional information. |
|
|
|
text1: {text1} |
|
text2: {text2} |
|
Answer:""" |
|
|
|
# Tokenize |
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
# Generate output |
|
with torch.no_grad(): |
|
outputs = model.generate(**inputs, max_new_tokens=5) |
|
answer = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) |
|
|
|
# Print result |
|
print("Model prediction:", answer.strip()) |
|
|
|
``` |
|
|
|
|
|
### Training framework versions |
|
|
|
- TRL: 0.16.0 |
|
- Transformers: 4.50.1 |
|
- Pytorch: 2.5.1 |
|
- Datasets: 3.4.1 |
|
- Tokenizers: 0.21.1 |
|
|
|
## Citations |