This model is a binary classifier developed to analyze comment authorship patterns on Korean news articles. For further details, refer to our paper on Journalism: News comment sections and online echo chambers: The ideological alignment between partisan news stories and their user comments

  • This model is a BERT classification model to classify Korean user generated comments into binary labels of liberal or conservative.
  • This model was trained on approximately 37,000 user generated comments collected from NAVER's news portal. The dataset was collected in 2019; as such, note that comments related to recent political topics might not be classified correctly.
  • This model is a finetuned model based on ETRI's KorBERT.

How to use

  • The model requires an edited version of the transformers class BertTokenizer, which can be found in the file KorBertTokenizer.py.
  • Usage example:
from KorBertTokenizer import KorBertTokenizer
from transformers import BertForSequenceClassification
import torch

tokenizer = KorBertTokenizer.from_pretrained('conviette/korPolBERT')
model = BertForSequenceClassification.from_pretrained('conviette/korPolBERT')

def classify(text):
    inputs = tokenizer(text, padding='max_length', max_length=70, return_tensors='pt')

    with torch.no_grad():
        logits=model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        return model.config.id2label[predicted_class_id]


input_strings = ['์ขŒํŒŒ๊ฐ€ ๋‚˜๋ผ ๊ฒฝ์ œ ์•ˆ๋ณด ๋ง์•„๋จน๋Š”๋‹ค',
                 '์ˆ˜๊ผด๋“ค์€ ๋‚˜๋ผ ์ผ๋ณธํ•œํ…Œ ํŒ”์•„๋จน์—ˆ๋ƒ']

for input_string in input_strings:
    print('===\n์ž…๋ ฅ ํ…์ŠคํŠธ: {}\n๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ: {}\n==='.format(input_string, classify(input_string)))

Model performance

Downloads last month
6
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.