rkdaldus's picture
Update README.md
d3c07ea verified
metadata
library_name: transformers
base_model:
  - monologg/kobert

KoBERT ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ๊ฐ์ • ๋ถ„๋ฅ˜ ๋ชจ๋ธ

์ด ํ”„๋กœ์ ํŠธ๋Š” ํ•œ๊ตญ์–ด ํ…์ŠคํŠธ์˜ ๊ฐ์ •์„ ๋ถ„๋ฅ˜ํ•˜๋Š” KoBERT ๊ธฐ๋ฐ˜์˜ ๊ฐ์ • ๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ํ™œ์šฉํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ์ž…๋ ฅ๋œ ํ…์ŠคํŠธ๊ฐ€ ๋ถ„๋…ธ(Anger), ๋‘๋ ค์›€(Fear), ๊ธฐ์จ(Happy), ํ‰์˜จ(Tender), ์Šฌํ””(Sad) ์ค‘ ์–ด๋–ค ๊ฐ์ •์— ํ•ด๋‹นํ•˜๋Š”์ง€๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

1. ๋ชจ๋ธ ํ•™์Šต ๊ณผ์ •

Colab ํ™˜๊ฒฝ ์„ค์ • ๋ฐ ๋ฐ์ดํ„ฐ ์ค€๋น„

  1. ํ•„์š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜: transformers, datasets, torch, pandas, scikit-learn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

  2. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ: ai hub ์— ๋“ฑ๋ก๋œ ํ•œ๊ตญ์–ด ๊ฐ์„ฑ ๋Œ€ํ™” ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ฐ์ • ๋ถ„๋ฅ˜์šฉ CSV ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.

  3. ๋ฐ์ดํ„ฐ์…‹ ์ค€๋น„:

    • ํ•™์Šต/๊ฒ€์ฆ ๋ฐ์ดํ„ฐ ๋ถ„ํ• : 80%๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ, 20%๋Š” ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉ.
    • HuggingFace Dataset ํ˜•์‹ ๋ณ€ํ™˜: Pandas DataFrame์„ HuggingFace Dataset์œผ๋กœ ๋ณ€ํ™˜.
    • ๋ ˆ์ด๋ธ” ์ปฌ๋Ÿผ๋ช… ๋ณ€๊ฒฝ: ๊ฐ์ • ๋ ˆ์ด๋ธ”์„ ๋‚˜ํƒ€๋‚ด๋Š” label_int ์ปฌ๋Ÿผ์„ labels๋กœ ๋ณ€๊ฒฝ.
    • ๋ฐ์ดํ„ฐ ํ† ํฐํ™”: monologg/kobert ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์ด์šฉํ•ด ์ž…๋ ฅ ํ…์ŠคํŠธ๋ฅผ ํ† ํฐํ™”.
    • ํ˜•์‹ ๋ณ€ํ™˜: input_ids, attention_mask, labels๋งŒ ๋‚จ๊ฒจ ํ•™์Šต ์ค€๋น„ ์™„๋ฃŒ.
  4. ๋ชจ๋ธ ๋ฐ ํ•™์Šต ์„ค์ •:

    • ๋ชจ๋ธ: monologg/kobert ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€ 5๊ฐœ์˜ ๊ฐ์ • ๋ ˆ์ด๋ธ”์„ ๋ถ„๋ฅ˜ํ•˜๋„๋ก ์„ค์ •.
    • ํ•™์Šต ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ:
      • learning_rate=2e-5, num_train_epochs=10, batch_size=16.
      • F1 ์Šค์ฝ”์–ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฒ ์ŠคํŠธ ๋ชจ๋ธ ์ €์žฅ.
      • Early stopping ์ ์šฉ.
  5. ํ•™์Šต ์ง„ํ–‰ ๋ฐ ๋ชจ๋ธ ์ €์žฅ:

    • ํ•™์Šต ์™„๋ฃŒ ํ›„ ๋ชจ๋ธ์„ Google Drive์— ์ €์žฅ.

์„ฑ๋Šฅ ํ‰๊ฐ€ ๋ฐ ํ…Œ์ŠคํŠธ

  • ํ‰๊ฐ€ ์ง€ํ‘œ: Accuracy, F1 score (macro, weighted) ๊ณ„์‚ฐ.
  • ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ํ‰๊ฐ€: ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ด์šฉํ•ด ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹ ํ‰๊ฐ€.

2. ๋ชจ๋ธ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

์‚ฌ์ „ ์ค€๋น„

  • HuggingFace Hub์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ ๋ฐ ํ† ํฌ๋‚˜์ด์ €๋Š” monologg/kobert ๊ธฐ๋ฐ˜์ด๋ฉฐ, ๋ถ„๋ฅ˜ ๋ ˆ์ด๋ธ”์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
    • Anger: ๐Ÿ˜ก
    • Fear: ๐Ÿ˜จ
    • Happy: ๐Ÿ˜Š
    • Tender: ๐Ÿฅฐ
    • Sad: ๐Ÿ˜ข

์‚ฌ์šฉ ์˜ˆ์‹œ

  1. ๋‹จ์ˆœ ๋ฌธ์žฅ ์ž…๋ ฅ ๊ฐ์ • ๋ถ„์„:

    • ์‚ฌ์šฉ์ž๊ฐ€ ์ž…๋ ฅํ•œ ํ…์ŠคํŠธ์— ๋Œ€ํ•ด ๋ชจ๋ธ์ด ๊ฐ์ •์„ ์˜ˆ์ธกํ•˜๊ณ , ๊ฐ ๊ฐ์ •์˜ ํ™•๋ฅ ์„ ํ•จ๊ป˜ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  2. ์—‘์…€ ํŒŒ์ผ์—์„œ ๊ฐ์ • ๋ถ„์„:

    • ์—‘์…€ ํŒŒ์ผ์—์„œ ์ง€์ •ํ•œ ํ…์ŠคํŠธ ์—ด๊ณผ ํ–‰ ๋ฒ”์œ„๋ฅผ ์ฝ์–ด์™€, ํ•ด๋‹น ํ…์ŠคํŠธ๋“ค์— ๋Œ€ํ•ด ๊ฐ์ •์„ ๋ถ„๋ฅ˜ํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

์ฝ”๋“œ ์‚ฌ์šฉ ์˜ˆ์‹œ

# ํ† ํฌ๋‚˜์ด์ € ๋ฐ ๋ชจ๋ธ ๋กœ๋“œ
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# KoBERT ํ† ํฌ๋‚˜์ด์ €์™€ ๋ชจ๋ธ ๋กœ๋“œ
tokenizer = AutoTokenizer.from_pretrained("monologg/kobert", trust_remote_code=True)
model = AutoModelForSequenceClassification.from_pretrained("rkdaldus/ko-sent5-classification")

# ์‚ฌ์šฉ์ž ์ž…๋ ฅ ํ…์ŠคํŠธ ๊ฐ์ • ๋ถ„์„
text = "์˜ค๋Š˜ ์ •๋ง ํ–‰๋ณตํ•ด!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
    outputs = model(**inputs)
predicted_label = torch.argmax(outputs.logits, dim=1).item()

# ๊ฐ์ • ๋ ˆ์ด๋ธ” ์ •์˜
emotion_labels = {
    0: ("Angry", "๐Ÿ˜ก"),
    1: ("Fear", "๐Ÿ˜จ"),
    2: ("Happy", "๐Ÿ˜Š"),
    3: ("Tender", "๐Ÿฅฐ"),
    4: ("Sad", "๐Ÿ˜ข")
}

# ์˜ˆ์ธก๋œ ๊ฐ์ • ์ถœ๋ ฅ
print(f"์˜ˆ์ธก๋œ ๊ฐ์ •: {emotion_labels[predicted_label][0]} {emotion_labels[predicted_label][1]}")