File size: 854 Bytes

8c8d49b
 
36628a7
 
 
 
 
8c8d49b

Pretrained K-mHas with multi-label model with "koelectra-v3"

You can use tokenizer of this model with "monologg/koelectra-v3-base-discriminator"

dataset : https://huggingface.co/datasets/jeanlee/kmhas_korean_hate_speech

pretrained_model : https://huggingface.co/monologg/koelectra-base-v3-discriminator

label maps are like this.
>>>
    {'origin': 0,
     'physical': 1,
     'politics': 2,
     'profanity': 3,
     'age': 4,
     'gender': 5,
     'race': 6,
     'religion': 7,
     'not_hate_speech': 8}

You can use label map with below code.
>
    
    from huggingface_hub import hf_hub_download

    repo_id = "JunHwi/kmhas_multilabel"

    filename = "kmhas_dict.pickle" # 위 repo_id에 업로드한 파일 이름

    label_dict = hf_hub_download(repo_id, filename)

    with open(label_dict, "rb") as f:
        label2num = pickle.load(f)