File size: 854 Bytes
8c8d49b
 
36628a7
 
 
 
 
8c8d49b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Pretrained K-mHas with multi-label model with "koelectra-v3"

You can use tokenizer of this model with "monologg/koelectra-v3-base-discriminator"

dataset : https://huggingface.co/datasets/jeanlee/kmhas_korean_hate_speech

pretrained_model : https://huggingface.co/monologg/koelectra-base-v3-discriminator

label maps are like this.
>>>
    {'origin': 0,
     'physical': 1,
     'politics': 2,
     'profanity': 3,
     'age': 4,
     'gender': 5,
     'race': 6,
     'religion': 7,
     'not_hate_speech': 8}

You can use label map with below code.
>
    
    from huggingface_hub import hf_hub_download

    repo_id = "JunHwi/kmhas_multilabel"

    filename = "kmhas_dict.pickle" # ์œ„ repo_id์— ์—…๋กœ๋“œํ•œ ํŒŒ์ผ ์ด๋ฆ„

    label_dict = hf_hub_download(repo_id, filename)

    with open(label_dict, "rb") as f:
        label2num = pickle.load(f)