kmhas_multilabel / README.md
JunHwi's picture
Update README.md
36628a7
|
raw
history blame
854 Bytes

Pretrained K-mHas with multi-label model with "koelectra-v3"

You can use tokenizer of this model with "monologg/koelectra-v3-base-discriminator"

dataset : https://huggingface.co/datasets/jeanlee/kmhas_korean_hate_speech

pretrained_model : https://huggingface.co/monologg/koelectra-base-v3-discriminator

label maps are like this.

{'origin': 0,
 'physical': 1,
 'politics': 2,
 'profanity': 3,
 'age': 4,
 'gender': 5,
 'race': 6,
 'religion': 7,
 'not_hate_speech': 8}

You can use label map with below code.

from huggingface_hub import hf_hub_download

repo_id = "JunHwi/kmhas_multilabel"

filename = "kmhas_dict.pickle" # ์œ„ repo_id์— ์—…๋กœ๋“œํ•œ ํŒŒ์ผ ์ด๋ฆ„

label_dict = hf_hub_download(repo_id, filename)

with open(label_dict, "rb") as f:
    label2num = pickle.load(f)