SMS ์ŠคํŒธ ๋ถ„๋ฅ˜๊ธฐ

ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ํ•œ๊ธ€ SMS๋ฅผ ์ง์ ‘ ๊ฐ€๊ณตํ•˜์—ฌ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์ด ๊ถ๊ธˆํ•˜์‹œ๋ฉด, ๋ฌธ์˜ ์ฃผ์„ธ์š”.

์ด ๋ชจ๋ธ์€ SMS ์ŠคํŒธ ํƒ์ง€๋ฅผ ์œ„ํ•ด ๋ฏธ์„ธ ์กฐ์ •๋œ BERT ๊ธฐ๋ฐ˜ ๋‹ค๊ตญ์–ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. SMS ๋ฉ”์‹œ์ง€๋ฅผ ham(๋น„์ŠคํŒธ) ๋˜๋Š” **spam(์ŠคํŒธ)**์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Hugging Face Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ bert-base-multilingual-cased ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.


๋ชจ๋ธ ์„ธ๋ถ€์ •๋ณด

  • ๊ธฐ๋ณธ ๋ชจ๋ธ: bert-base-multilingual-cased
  • ํƒœ์Šคํฌ: ๋ฌธ์žฅ ๋ถ„๋ฅ˜(Sequence Classification)
  • ์ง€์› ์–ธ์–ด: ๋‹ค๊ตญ์–ด
  • ๋ผ๋ฒจ ์ˆ˜: 2 (ham, spam)
  • ๋ฐ์ดํ„ฐ์…‹: ํด๋ฆฐ๋œ SMS ์ŠคํŒธ ๋ฐ์ดํ„ฐ์…‹

๋ฐ์ดํ„ฐ์…‹ ์ •๋ณด

ํ›ˆ๋ จ ๋ฐ ํ‰๊ฐ€์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ์…‹์€ ham(๋น„์ŠคํŒธ) ๋˜๋Š” spam(์ŠคํŒธ)์œผ๋กœ ๋ผ๋ฒจ๋ง๋œ SMS ๋ฉ”์‹œ์ง€๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๋Š” ์ „์ฒ˜๋ฆฌ๋ฅผ ๊ฑฐ์นœ ํ›„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ถ„๋ฆฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค:

  • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ: 80%
  • ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ: 20%

ํ•™์Šต ์„ค์ •

  • ํ•™์Šต๋ฅ (Learning Rate): 2e-5
  • ๋ฐฐ์น˜ ํฌ๊ธฐ(Batch Size): 8 (๋””๋ฐ”์ด์Šค ๋‹น)
  • ์—ํฌํฌ(Epochs): 1
  • ํ‰๊ฐ€ ์ „๋žต: ์—ํฌํฌ ๋‹จ์œ„
  • ํ† ํฌ๋‚˜์ด์ €: bert-base-multilingual-cased

์ด ๋ชจ๋ธ์€ Hugging Face์˜ Trainer API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํšจ์œจ์ ์œผ๋กœ ๋ฏธ์„ธ ์กฐ์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค.


์‚ฌ์šฉ ๋ฐฉ๋ฒ•

์ด ๋ชจ๋ธ์€ Hugging Face Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ†ตํ•ด ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# ๋ชจ๋ธ๊ณผ ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ
tokenizer = AutoTokenizer.from_pretrained("blockenters/sms-spam-classifier")
model = AutoModelForSequenceClassification.from_pretrained("blockenters/sms-spam-classifier")

# ์ž…๋ ฅ ์ƒ˜ํ”Œ
text = "์ถ•ํ•˜ํ•ฉ๋‹ˆ๋‹ค! ๋ฌด๋ฃŒ ๋ฐœ๋ฆฌ ์—ฌํ–‰ ํ‹ฐ์ผ“์„ ๋ฐ›์œผ์…จ์Šต๋‹ˆ๋‹ค. WIN์ด๋ผ๊ณ  ํšŒ์‹ ํ•˜์„ธ์š”."

# ํ† ํฐํ™” ๋ฐ ์˜ˆ์ธก
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=-1)

# ์˜ˆ์ธก ๊ฒฐ๊ณผ ๋””์ฝ”๋”ฉ
label_map = {0: "ham", 1: "spam"}
print(f"์˜ˆ์ธก ๊ฒฐ๊ณผ: {label_map[predictions.item()]}")
Downloads last month
20
Safetensors
Model size
178M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for blockenters/sms-spam-classifier

Finetuned
(637)
this model