conviette/korPolBERT · Hugging Face

This model is a binary classifier developed to analyze comment authorship patterns on Korean news articles. For further details, refer to our paper on Journalism: News comment sections and online echo chambers: The ideological alignment between partisan news stories and their user comments

This model is a BERT classification model to classify Korean user generated comments into binary labels of liberal or conservative.
This model was trained on approximately 37,000 user generated comments collected from NAVER's news portal. The dataset was collected in 2019; as such, note that comments related to recent political topics might not be classified correctly.
This model is a finetuned model based on ETRI's KorBERT.

How to use

The model requires an edited version of the transformers class BertTokenizer, which can be found in the file KorBertTokenizer.py.
Usage example:

from KorBertTokenizer import KorBertTokenizer
from transformers import BertForSequenceClassification
import torch

tokenizer = KorBertTokenizer.from_pretrained('conviette/korPolBERT')
model = BertForSequenceClassification.from_pretrained('conviette/korPolBERT')

def classify(text):
    inputs = tokenizer(text, padding='max_length', max_length=70, return_tensors='pt')

    with torch.no_grad():
        logits=model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        return model.config.id2label[predicted_class_id]


input_strings = ['좌파가 나라 경제 안보 말아먹는다',
                 '수꼴들은 나라 일본한테 팔아먹었냐']

for input_string in input_strings:
    print('===\n입력 텍스트: {}\n분류 결과: {}\n==='.format(input_string, classify(input_string)))

Model performance

Accuracy: 0.8322
F1-Score: 0.8322
For further technical details on the model, refer to our paper for the W-NUT workshop (EMNLP 2019), The Fallacy of Echo Chambers: Analyzing the Political Slants of User-Generated News Comments in Korean Media.