KoELECTRA-small-v3-privacy-ner

This model is a fine-tuned version of monologg/koelectra-small-v3-discriminator on a synthesized privacy dataset. It achieves the following results on the evaluation set:

  • f1 = 0.9998728608843798
  • loss = 0.05310981854414328
  • precision = 0.9999237126509853
  • recall = 0.9998220142897098

Model description

ํƒœ๊น… ์‹œ์Šคํ…œ : BIO ์‹œ์Šคํ…œ

  • -B(begin) : ๊ฐœ์ฒด๋ช…์ด ์‹œ์ž‘ํ•  ๋•Œ
  • -I(inside) : ํ† ํฐ์ด ๊ฐœ์ฒด๋ช… ์ค‘๊ฐ„์— ์žˆ์„ ๋•Œ
  • O(outside) : ํ† ํฐ์ด ๊ฐœ์ฒด๋ช…์ด ์•„๋‹ ๊ฒฝ์šฐ

12๊ฐ€์ง€ ํ•œ๊ตญ์ธ ๊ฐœ์ธ์ •๋ณด ํŒจํ„ด์— ๋Œ€ํ•œ ํƒœ๊ทธ์…‹

๋ถ„๋ฅ˜ ํ‘œ๊ธฐ ์ •์˜
PERSON PER ํ•œ๊ตญ์ธ ์ด๋ฆ„
LOCATION LOC ํ•œ๊ตญ ์ฃผ์†Œ
RESIDENT REGISTRATION NUMBER RRN ํ•œ๊ตญ์ธ ์ฃผ๋ฏผ๋“ฑ๋ก๋ฒˆํ˜ธ
EMAIL EMA ์ด๋ฉ”์ผ
ID ID ์ผ๋ฐ˜ ๋กœ๊ทธ์ธ ID
PASSWORD PWD ์ผ๋ฐ˜ ๋กœ๊ทธ์ธ ๋น„๋ฐ€๋ฒˆํ˜ธ
ORGANIZATION ORG ์†Œ์† ๊ธฐ๊ด€
PHONE NUMBER PHN ์ „ํ™”๋ฒˆํ˜ธ
CARD NUMBER CRD ์นด๋“œ๋ฒˆํ˜ธ
ACCOUNT NUMBER ACC ๊ณ„์ขŒ๋ฒˆํ˜ธ
PASSPORT NUMBER PSP ์—ฌ๊ถŒ๋ฒˆํ˜ธ
DRIVER'S LICENSE NUMBER DLN ์šด์ „๋ฉดํ—ˆ๋ฒˆํ˜ธ

How to use

You can use this model with Transformers pipeline for NER.

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("amoeba04/test1")
model = AutoModelForTokenClassification.from_pretrained("amoeba04/test1")
ner = pipeline("ner", model=model, tokenizer=tokenizer)

example = "์ง€๋‚œ์ฃผ, ํ™๊ธธ๋™ ์”จ๋Š” ์„œ์šธํŠน๋ณ„์‹œ ๊ฐ•๋‚จ๊ตฌ์— ์œ„์น˜ํ•œ ํ…Œํ—ค๋ž€๋กœ 101๋นŒ๋”ฉ์—์„œ ์ง„ํ–‰๋œ IT ์ปจํผ๋Ÿฐ์Šค์— ์ฐธ์„ํ–ˆ์Šต๋‹ˆ๋‹ค."
ner_results = ner(example)
print(ner_results)

์ถœ๋ ฅ: "PER-B, PER-B ์”จ๋Š” LOC-BLOC-ILOC-I LOC-ILOC-I LOC-ILOC-I LOC-ILOC-I LOC-ILOC-ILOC-I์—์„œ ์ง„ํ–‰๋œ IT ์ปจํผ๋Ÿฐ์Šค์— ์ฐธ์„ํ–ˆ์Šต๋‹ˆ๋‹ค."

Training and evaluation data

์ž์ฒด ์ œ์ž‘ํ•œ ํ•œ๊ตญ์ธ ๊ฐœ์ธ์ •๋ณด ํŒจํ„ด ๊ธฐ๋ฐ˜ ๊ฐœ์ฒด๋ช… ์ธ์‹ (NER) ๋ฐ์ดํ„ฐ์…‹

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 512
  • eval_batch_size: 1024
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.2.1+cu118
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
219
Safetensors
Model size
14.1M params
Tensor type
I64
ยท
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for amoeba04/koelectra-small-v3-privacy-ner

Finetuned
(3)
this model