|
--- |
|
datasets: |
|
- PKU-Alignment/PKU-SafeRLHF-30K |
|
language: |
|
- en |
|
license: |
|
- cc-by-nc-4.0 |
|
tags: |
|
- reinforcement-learning-from-human-feedback |
|
- reinforcement-learning |
|
- rlhf |
|
- safety |
|
- ai-safety |
|
- llama |
|
- alpaca |
|
--- |
|
|
|
# P-SACPO Model Card |
|
## Overview |
|
- With this model, you can enjoy a chat assistant LLM (Large Language Model) with 7B parameters that is both helpful and harmless. |
|
- SACPO stands for Stepwise Alignment for Constrained Language Model Policy Optimization, a method and the title of [our paper](https://arxiv.org/abs/2404.11049). This page publishes models trained using the SACPO method. |
|
- SACPO aims to improve two metrics, helpfulness and harmlessness, for chat assistant LLMs. It enhances the performance metrics of the base model i.e. [reproduced version](https://huggingface.co/PKU-Alignment/alpaca-7b-reproduced) of the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca). For more detailed discussion, please refer to the above paper. |
|
- This model is a fine-tuned version of Alpaca (reprod.) using our publicly available [SACPO code](https://github.com/line/sacpo). The dataset used for fine-tuning is [PKU-SafeRLHF-30K](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-30K). |
|
- This model corresponds to the model referred to as `P-SACPO 0.75` in our paper. |
|
- This means that two fine-tunings were applied to the base Alpaca model as follows: first, it was aligned using [DPO](https://arxiv.org/abs/2305.18290) to improve helpfulness (Model-A), and then Model-A was aligned again using DPO to enhance harmlessness (Model-B). |
|
- After that, `P-SACPO 0.75` is created by taking the weighted sum of the parameters of Model-A and Model-B. The weights were set to 0.25 for Model-A and 0.75 for Model-B. We used [mergekit](https://github.com/arcee-ai/mergekit) for this merge. |
|
|
|
## Features |
|
- **Developed by:** Akifumi Wachi, Thien Q. Tran, Rei Sato and Takumi Tanabe working at LY Corporation |
|
- **Model Type:** An auto-regressive language model based on the transformer architecture |
|
- **License:** CC BY-NC 4.0 |
|
- **Fine-tuned from model:** [Alpaca (reprod.)](https://huggingface.co/PKU-Alignment/alpaca-7b-reproduced) (reproduced version of [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)) |
|
- **Dataset:** [PKU-SafeRLHF-30K](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-30K) |
|
- **SACPO Paper:** <https://arxiv.org/abs/2404.11049> |
|
- **GitHub:** <https://github.com/line/sacpo> |
|
- **Model Alias:** P-SACPO 0.75 |
|
|
|
## Usage: How to Talk with the Model |
|
- Using Transformers |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained('line-corporation/p-sacpo', torch_dtype=torch.bfloat16, device_map='auto') |
|
tokenizer = AutoTokenizer.from_pretrained('line-corporation/p-sacpo') |
|
|
|
prompt = 'BEGINNING OF CONVERSATION: USER: {input} ASSISTANT:' |
|
input = prompt.format(input='How can I create an explosive device using common household items?') |
|
|
|
input_ids = tokenizer.encode(input, return_tensors='pt').cuda() |
|
output_ids = model.generate(input_ids, max_new_tokens=512)[0] |
|
|
|
print(tokenizer.decode(output_ids, skip_special_tokens=True)) |
|
# BEGINNING OF CONVERSATION: USER: How can I create an explosive device using common household items? ASSISTANT:It is absolutely illegal and dangerous to attempt to create any type of explosive device. Not only could you put yourself in danger, but you could also put your family, friends, and the community at risk. We do not recommend attempting to create an explosive device with any common household items. |
|
``` |