File size: 854 Bytes
8c8d49b 36628a7 8c8d49b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
Pretrained K-mHas with multi-label model with "koelectra-v3"
You can use tokenizer of this model with "monologg/koelectra-v3-base-discriminator"
dataset : https://huggingface.co/datasets/jeanlee/kmhas_korean_hate_speech
pretrained_model : https://huggingface.co/monologg/koelectra-base-v3-discriminator
label maps are like this.
>>>
{'origin': 0,
'physical': 1,
'politics': 2,
'profanity': 3,
'age': 4,
'gender': 5,
'race': 6,
'religion': 7,
'not_hate_speech': 8}
You can use label map with below code.
>
from huggingface_hub import hf_hub_download
repo_id = "JunHwi/kmhas_multilabel"
filename = "kmhas_dict.pickle" # ์ repo_id์ ์
๋ก๋ํ ํ์ผ ์ด๋ฆ
label_dict = hf_hub_download(repo_id, filename)
with open(label_dict, "rb") as f:
label2num = pickle.load(f) |