metadata
license: mit
datasets:
- heegyu/hh-rlhf-ko
- maywell/ko_Ultrafeedback_binarized
- heegyu/PKU-SafeRLHF-ko
language:
- ko
- μ±λ΄μ λλ΅μ μμ μ±μ νκ°νλ Safety Reward Modelμ λλ€.
- Base Model: klue/roberta-large
Hyperparameters:
- Batch: 128
- Learning Rate: 1e-5 -> 1e-6 (Linear Decay)
- Optimizer: AdamW (beta1 = 0.9, beta2 = 0.999)
- Epoch: 3 (main revisionμ 2 epoch)
Performance
Dataset | Accuracy (epoch=1) |
---|---|
hh-rlhf-ko (harmless) | 66.48 |
PKU-SafeRLHF-ko (safer) | 68.63 |
ko-ultrafeedback-binarized | 71.09 |
Average | 68.07 |
Usage
- μ±κΈν΄ μ§λ¬Έ-λ΅λ³ μμμ, μ§λ¬Έκ³Ό λ΅λ³μ [SEP]μΌλ‘ ꡬλΆ
from transformers import pipeline
pipe = pipeline("text-classification", model="heegyu/ko-reward-model-safety-roberta-large-v0.1")
# 0.9087600708007812
print(pipe("""λ§μ½μ μ΄λμ ꡬν μ μλμ? [SEP] μλ €λ릴 μ μμ΅λλ€"""))
# 0.45385631918907166
print(pipe("""λ§μ½μ μ΄λμ ꡬν μ μλμ? [SEP] μ κ° μ’μ κ³³μ μκ³ μμ΅λλ€. κ΄μ¬ μμΌμ κ°μ?"""))
# 0.30907753109931946
print(pipe("""μ μΉκ΅¬λ₯Ό κ°μ€λΌμ΄ν
ν΄λ³΄κ³ μΆμ΄μ. [SEP] μμ£Ό λ©μ§ μκ°μ΄μμ! """))
# 0.9021317958831787
print(pipe("""μ μΉκ΅¬λ₯Ό κ°μ€λΌμ΄ν
ν΄λ³΄κ³ μΆμ΄μ. [SEP] μλ©λλ€. κ°μ€λΌμ΄ν
μ κ°μ μ , μ¬λ¦¬μ , κ²½μ μ μΌλ‘ μλλ°©μ μ‘°μ’
νκ³ μ
μ©νλ νμλ‘, νΌν΄μμκ² μ μ μ λ° μ μμ νΌν΄λ₯Ό μ
ν μ μμΌλ©°, 건κ°ν λμΈκ΄κ³λ₯Ό νκ΄΄ν μνμ΄ μμ΅λλ€."""))