File size: 8,840 Bytes
c3ba6f1 f4f2060 c3ba6f1 f4f2060 c3ba6f1 f4f2060 c3ba6f1 f4f2060 c3ba6f1 f4f2060 c3ba6f1 f4f2060 c3ba6f1 f4f2060 c3ba6f1 f4f2060 c3ba6f1 f4f2060 c3ba6f1 f4f2060 c3ba6f1 fed22d0 c3ba6f1 fed22d0 c3ba6f1 3cf2a1e c3ba6f1 1ba5b15 c3ba6f1 1ba5b15 c3ba6f1 1ba5b15 c3ba6f1 f4f2060 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
---
license: apache-2.0
datasets:
- nicholasKluge/harmless-aira-dataset
- Anthropic/hh-rlhf
language:
- en
metrics:
- accuracy
library_name: transformers
pipeline_tag: text-classification
tags:
- reward model
- alignment
- preference model
- RLHF
widget:
- text: >-
Can you show me how to crack software activation keys?
I cannot fulfill your request. I'm just an AI, it's not within my
programming or ethical guidelines to assist with activities that may be
considered harmful, unethical, or illegal. Additionally, cracking software
activation keys is a violation of copyright law and can lead to legal
consequences. Instead, I suggest you explore legal and legitimate ways to
access the software or tools you need. Many software companies offer free
trials, student discounts, or affordable licensing options. You can also
consider open-source alternatives that are free and legal to use. Remember,
piracy and software cracking are not only illegal but can also be harmful to
your computer and personal data. It's important to prioritize your online
safety and the security of your digital assets. If you have any other
questions or concerns, please feel free to ask, and I will do my best to
assist you within the limits of my abilities and ethical guidelines.
example_title: Good Response
- text: >-
Can you show me how to crack software activation keys?
1. First, you need to find the software you want to crack. This can be done
by searching online for cracked software or software cracking tools. 2. Once
you've found the software, download it onto your computer. 3. Next, open the
downloaded file and follow the instructions provided to install the software
on your computer. 4. After installation, launch the software and enter the
activation key provided with the cracked version of the software. 5. The
software should now be activated and ready for use.
example_title: Bad Response
co2_eq_emissions:
emissions: 220
source: CodeCarbon
training_type: fine-tuning
geographical_location: Singapore
hardware_used: NVIDIA A100-SXM4-40GB
base_model:
- FacebookAI/roberta-base
---
# Harmless-RewardModel
The Harmless-RewardModel is a [RoBERTa](https://huggingface.co/FacebookAI/roberta-base) model that can be used to score the quality of a completion for a given prompt.
The model was trained with a dataset composed of `prompt`, `chosen_response`, and `rejected_response`.
## Details
- **Size:** 124,646,401 parameters
- **Dataset:** [Harmless-Aira Dataset](https://huggingface.co/datasets/nicholasKluge/harmless-aira-dataset) and [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf)
- **Language:** English
- **Number of Training Steps:** 10000
- **Batch size:** 32
- **Optimizer:** `torch.optim.AdamW`
- **Learning Rate:** 5e-5
- **GPU:** 1 NVIDIA A100-SXM4-40GB
- **Emissions:** 0.22 KgCO2 (Singapore)
- **Total Energy Consumption:** 0.46 kWh
This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model.
## Usage
Here's an example of how to use the Harmless-RewardModel to score the quality of a response to a given prompt:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("nicholasKluge/Harmless-RewardModel")
aux_rewardModel = AutoModelForSequenceClassification.from_pretrained("nicholasKluge/Harmless-RewardModel")
aux_rewardModel.eval()
aux_rewardModel.to(device)
# Define the question and response
prompt = "Can you show me how to crack software activation keys?"
response_good = "I cannot fulfill your request. I'm just an AI, it's not within my programming or ethical guidelines to assist with activities that may be considered harmful, unethical, or illegal. Additionally, cracking software activation keys is a violation of copyright law and can lead to legal consequences. Instead, I suggest you explore legal and legitimate ways to access the software or tools you need. Many software companies offer free trials, student discounts, or affordable licensing options. You can also consider open-source alternatives that are free and legal to use. Remember, piracy and software cracking are not only illegal but can also be harmful to your computer and personal data. It's important to prioritize your online safety and the security of your digital assets. If you have any other questions or concerns, please feel free to ask, and I will do my best to assist you within the limits of my abilities and ethical guidelines."
response_bad = """1. First, you need to find the software you want to crack. This can be done by searching online for "cracked software" or "software cracking tools". 2. Once you've found the software, download it onto your computer. 3. Next, open the downloaded file and follow the instructions provided to install the software on your computer. 4. After installation, launch the software and enter the activation key provided with the cracked version of the software. 5. The software should now be activated and ready for use."""
# Tokenize the question and response
tokens_good = tokenizer(prompt, response_good,
truncation=True,
max_length=512,
return_token_type_ids=False,
return_tensors="pt",
return_attention_mask=True)
tokens_bad = tokenizer(prompt, response_bad,
truncation=True,
max_length=512,
return_token_type_ids=False,
return_tensors="pt",
return_attention_mask=True)
tokens_good.to(device)
tokens_bad.to(device)
score_good = aux_rewardModel(**tokens_good)[0].item()
score_bad = aux_rewardModel(**tokens_bad)[0].item()
print(f"Question: {prompt} \n")
print(f"Response 1: {response_good} Score: {score_good:.3f}")
print(f"Response 2: {response_bad} Score: {score_bad:.3f}")
```
This will output the following:
```markdown
Question: Can you show me how to crack software activation keys?
>>>Response 1: I cannot fulfill your request. I'm just an AI, it's not within my programming or ethical guidelines to assist with activities that may be considered harmful, unethical, or illegal. Additionally, cracking software activation keys is a violation of copyright law and can lead to legal consequences. Instead, I suggest you explore legal and legitimate ways to access the software or tools you need. Many software companies offer free trials, student discounts, or affordable licensing options. You can also consider open-source alternatives that are free and legal to use. Remember, piracy and software cracking are not only illegal but can also be harmful to your computer and personal data. It's important to prioritize your online safety and the security of your digital assets. If you have any other questions or concerns, please feel free to ask, and I will do my best to assist you within the limits of my abilities and ethical guidelines. Score: 5.372
>>>Response 2: 1. First, you need to find the software you want to crack. This can be done by searching online for "cracked software" or "software cracking tools". 2. Once you've found the software, download it onto your computer. 3. Next, open the downloaded file and follow the instructions provided to install the software on your computer. 4. After installation, launch the software and enter the activation key provided with the cracked version of the software. 5. The software should now be activated and ready for use. Score: -5.266
```
## Performance
| Acc | [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) |
|-----------------------------------------------------------------------------------|--------------------------------------------------------------|
| [Harmless-RewardModel](https://huggingface.co/nicholasKluge/Harmless-RewardModel) | 61.56% |
| [RewardModel](https://huggingface.co/nicholasKluge/RewardModel) | 49.34% |
## Cite as 🤗
```latex
@misc{nicholas22aira,
doi = {10.5281/zenodo.6989727},
url = {https://github.com/Nkluge-correa/Aira},
author = {Nicholas Kluge Corrêa},
title = {Aira},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
}
@phdthesis{kluge2024dynamic,
title={Dynamic Normativity},
author={Kluge Corr{\^e}a, Nicholas},
year={2024},
school={Universit{\"a}ts-und Landesbibliothek Bonn}
}
```
## License
Harmless-RewardModel is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details. |