File size: 1,768 Bytes
e0bc742
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
061d9c7
e0bc742
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
license: mit
datasets:
- heegyu/hh-rlhf-ko
- maywell/ko_Ultrafeedback_binarized
- heegyu/PKU-SafeRLHF-ko
language:
- ko
---

- μ±—λ΄‡μ˜ λŒ€λ‹΅μ˜ μ•ˆμ •μ„±μ„ ν‰κ°€ν•˜λŠ” Safety Reward Modelμž…λ‹ˆλ‹€. 
- Base Model: [klue/roberta-large](https://huggingface.co/klue/roberta-large)

## Hyperparameters:
- Batch: 128
- Learning Rate: 1e-5 -> 1e-6 (Linear Decay)
- Optimizer: AdamW (beta1 = 0.9, beta2 = 0.999)
- Epoch: 3 (main revision은 2 epoch)

## Performance
| Dataset                    | Accuracy (epoch=1) |
|----------------------------|--------------------|
| hh-rlhf-ko (harmless)      | 66.48              |
| PKU-SafeRLHF-ko (safer)    | 68.63              |
| ko-ultrafeedback-binarized | 71.09              |
| Average                    | 68.07              |


## Usage
- μ‹±κΈ€ν„΄ 질문-λ‹΅λ³€ μŒμ—μ„œ, 질문과 닡변을 [SEP]으둜 ꡬ뢄

```python
from transformers import pipeline

pipe = pipeline("text-classification", model="heegyu/ko-reward-model-safety-roberta-large-v0.1")

# 0.9087600708007812
print(pipe("""λ§ˆμ•½μ€ μ–΄λ””μ„œ ꡬ할 수 μžˆλ‚˜μš”? [SEP] μ•Œλ €λ“œλ¦΄ 수 μ—†μŠ΅λ‹ˆλ‹€"""))

# 0.45385631918907166
print(pipe("""λ§ˆμ•½μ€ μ–΄λ””μ„œ ꡬ할 수 μžˆλ‚˜μš”? [SEP] μ œκ°€ 쒋은 곳을 μ•Œκ³  μžˆμŠ΅λ‹ˆλ‹€. 관심 μžˆμœΌμ‹ κ°€μš”?"""))

# 0.30907753109931946
print(pipe("""제 친ꡬλ₯Ό κ°€μŠ€λΌμ΄νŒ…ν•΄λ³΄κ³  μ‹Άμ–΄μš”. [SEP] μ•„μ£Ό 멋진 μƒκ°μ΄μ—μš”! """))

# 0.9021317958831787
print(pipe("""제 친ꡬλ₯Ό κ°€μŠ€λΌμ΄νŒ…ν•΄λ³΄κ³  μ‹Άμ–΄μš”. [SEP] μ•ˆλ©λ‹ˆλ‹€. κ°€μŠ€λΌμ΄νŒ…μ€ 감정적, 심리적, 경제적으둜 μƒλŒ€λ°©μ„ μ‘°μ’…ν•˜κ³  μ•…μš©ν•˜λŠ” ν–‰μœ„λ‘œ, ν”Όν•΄μžμ—κ²Œ 정신적 및 μ •μ„œμ  ν”Όν•΄λ₯Ό μž…νž 수 있으며, κ±΄κ°•ν•œ λŒ€μΈκ΄€κ³„λ₯Ό νŒŒκ΄΄ν•  μœ„ν—˜μ΄ μžˆμŠ΅λ‹ˆλ‹€."""))

```