metadata

base_model: mini1013/master_domain
library_name: setfit
metrics:
  - metric
pipeline_tag: text-classification
tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
widget:
  - text: >-
      한글과컴퓨터 한컴오피스 2024 한글 Open 라이선스 [기업용/영구/2User이상] 한컴오피스 2024 (한글/한셀/한쇼)
      (주)유비소프트웨어
  - text: 한글과컴퓨터 한글 2022 (기업용/패키지/USB방식)  아이코다(주)
  - text: 한글과컴퓨터 한컴독스 기업용 ESD 1년 사용  (주)대성클라우드
  - text: '[한글과컴퓨터] 한컴오피스 2022 [기업용/패키지/1년사용/제품키배송형]  (주)컴퓨존'
  - text: '[마이크로소프트코리아] MS Windows 7 Professional DSP 한글 64bit/정품라벨  (주)소프트존'
inference: true
model-index:
  - name: SetFit with mini1013/master_domain
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: Unknown
          type: unknown
          split: test
        metrics:
          - type: metric
            value: 1
            name: Metric

SetFit with mini1013/master_domain

This is a SetFit model that can be used for Text Classification. This SetFit model uses mini1013/master_domain as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: mini1013/master_domain
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 6 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
4	'정품 스토어 MS Windows 11 Home 한글 FPP 윈도우11 홈 설치USB 패키지 인증키 (주)에스비코어' '윈도우11 프로 FPP(USB) 노트북 업그레이드 전용상품 주식회사 이좋은세상' '[MS코리아정품] Windows 11 Pro FPP 한글 처음사용자용 영구 제품키 주식회사 레오솔루션'
1	'[Adobe] Photoshop for teams [기업용/라이선스/1년사용] [1개~~9개 구매시(1개당 가격)] [발송 3~~7일 소요] 갱신 (주)컴퓨존' 'Movavi Video Editor 2024 기업용 라이선스 / 모바비 주식회사 글래드소프트' 'Movavi Video Suite 2024 공공기관용 라이선스 / 모바비2024 메모리콕'
2	'안랩 V3 Net for Windows Server 9.0 DSP (1년) (주)위프로소프트' '안랩 V3 Net for Windows Server 9.0 (기업용/DSP/1년) 아이코다(주)' '안랩 V3 Net for Unix Server (기업용 1년사용) 아이코다(주)'
3	'[문자발송]한컴독스 개인용 1년(구독형 한컴오피스) / 윈도우 맥용 설치 파일 지원 주식회사 지엘스토어' '한컴독스 개인용 1년 제품키배송형(구독형 한컴오피스) / 윈도우 맥용 설치 파일 지원 확인 주식회사 라이프큐브' '[마이크로소프트] Office 2019 Home & Student PKC [가정용/패키지/한글] 택배 발송 오시리스랩 주식회사'
5	'[1분발송]리훈 오늘기억 일기장 다이어리 굿노트 아이패드 PDF 속지 3년 감사 1.오른손잡이용_1.3년다이어리 주식회사 리훈 (RIHOON CO., LTD.)' '[스티커2종] 24년 오리지날 굿노트 디지털 속지 - 데일리 가로형(1D2P 형식) (아이패드 갤럭시탭 하이퍼링크 PDF 속지) (주)프랭클린 플래너 코리아' '[1분발송]리훈 하고싶은말 일기장 다이어리 굿노트 아이패드 PDF 속지 날짜형（23년10월-24년12월）_오른손잡이용 주식회사 리훈 (RIHOON CO., LTD.)'
0	'Radmin 3 Standard license 기업용/ 영구(ESD) (주)삼경엠' 'Radmin 3 - 50 Licenses Pack 기업용 라이선스 /알어드민 / 원격지원 / 50대설치 메모리콕' 'Radmin 3 Standard 기업용 라이선스 /알어드민 / 원격지원 메모리콕'

Evaluation

Metrics

Label	Metric
all	1.0

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("mini1013/master_cate_el12")
# Run inference
preds = model("한글과컴퓨터 한컴독스 기업용 ESD 1년 사용  (주)대성클라우드")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	6	11.8852	21

Label	Training Sample Count
0	3
1	34
2	33
3	50
4	50
5	13

Training Hyperparameters

batch_size: (512, 512)
num_epochs: (20, 20)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 40
body_learning_rate: (2e-05, 2e-05)
head_learning_rate: 2e-05
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0345	1	0.496	-
1.7241	50	0.0031	-
3.4483	100	0.0001	-
5.1724	150	0.0	-
6.8966	200	0.0	-
8.6207	250	0.0	-
10.3448	300	0.0	-
12.0690	350	0.0	-
13.7931	400	0.0	-
15.5172	450	0.0	-
17.2414	500	0.0	-
18.9655	550	0.0	-

Framework Versions

Python: 3.10.12
SetFit: 1.1.0.dev0
Sentence Transformers: 3.1.1
Transformers: 4.46.1
PyTorch: 2.4.0+cu121
Datasets: 2.20.0
Tokenizers: 0.20.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}