SetFit with mini1013/master_domain

This is a SetFit model that can be used for Text Classification. This SetFit model uses mini1013/master_domain as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: mini1013/master_domain
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 8 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
6.0	'아이언맥스 프로틴 쨈 스프레드 초코 아몬드 250g 2팩 IronMaxx 블레스윤' '페레로 누텔라 헤이즐넛 코코아 스프레드 370g 3개 누텔라 헤이즐넛 코코아 스프레드 370g 3개 홈마트' '누텔라 헤이즐넛 코코아 스프레드 370g x 2개 [라면] 봉지라면_오뚜기 짜슐랭 145g 20개 옐로우로켓'
1.0	'[가당딸기] 국산 냉동 가당딸기 2kg 아이스베리 (6개/박스) 주식회사 커피바바' '복음자리 진심의 딸기 1kg 딸기청 🍓진심의 딸기 1kg 5개🍓 담다' '초록원 과일잼 1kg x 2개 딸기잼 1021653 딸기잼1kg 블루베리잼1kg_파인애플망고잼1kg 앤디월드'
5.0	'Torani 무설탕 소스, 다크 초콜릿, 1.9L(64온스) 화이트 초콜릿_64 Fl Oz (Pack of 1) 저무리5' '모카믹스 다크소스 초콜렛 2kg 1박스 6개 초코소스 엠씨컴퍼니 (주)' '매일유업 테너소스 초콜렛 1.35kg 1병 카라멜 1.35kg 티피컨테이너'
4.0	'오뚜기 맛있는 사과쨈 300G 홈카페 식재료 토스트 브런치 캠핑 아이들 간식 봄날스토어' '오뚜기 Light sugar 사과쨈 290g 4개 007스테이지스' '[달콤한 맛있는] 밀크스프레드 얼그레이 235g [블루베리 딸기 사과 포도 버터맛] 레인보우'
0.0	'포모나 얼그레이 하이볼 시럽 밀크티 홍차 1000ml 06-포모나 카라멜 시럽 주식회사 커피창고' '프프프베이커리 빵에 발라먹는 버터스프레드 얼그레이 맛 【1개】 허니 데칼컴퍼니(Decal Company)' '매일 테너베이스 청포도 에이드 스무디 농축액 1.2kg 1022147 오렌지 1.2kg 가이던스'
3.0	'LB 메이플시럽189ml(병) (N2) 주식회사 에스에스지닷컴' '마누카 헬스 Manuka health 마누카 허니 MGO 250+ 시럽 100ml K&G GmbH' '시럽 초콜렛 네이처 컨트리 라몬제이'
7.0	'커피시럽 카페시럽 1.5L x2병 대상 롯데 파우더 커피 대상 로즈버드 그린티 파우더 500g 가루녹차 하늘담아' '토라니 카라멜 미니 토핑용소스 468g / 카라멜마끼야또 카라멜라떼 (주)오케이푸드' '1883 헤이즐넛시럽 1883 라임 시럽 1000ml 엔에프 컴퍼니'
2.0	'신세계 가공리고땅콩버터크리미 462g 주식회사 에스에스지닷컴' '스키피 땅콩버터1.36kg 스키피 크리미 땅콩버터 2.27kg 두두유통' '피비핏 버터 오리지널 파우더 피넛 프리 프로틴 글루텐 850g 에코프리'

Evaluation

Metrics

Label	Metric
all	0.6548

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("mini1013/master_cate_fd16")
# Run inference
preds = model("리고 초코 시럽 585g 2개세트  (주)비앤씨인터내셔널")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	4	10.8025	29

Label	Training Sample Count
0.0	50
1.0	50
2.0	50
3.0	50
4.0	50
5.0	50
6.0	50
7.0	50

Training Hyperparameters

batch_size: (512, 512)
num_epochs: (20, 20)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 40
body_learning_rate: (2e-05, 2e-05)
head_learning_rate: 2e-05
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0159	1	0.4035	-
0.7937	50	0.322	-
1.5873	100	0.125	-
2.3810	150	0.0315	-
3.1746	200	0.0111	-
3.9683	250	0.0005	-
4.7619	300	0.0002	-
5.5556	350	0.0001	-
6.3492	400	0.0001	-
7.1429	450	0.0001	-
7.9365	500	0.0001	-
8.7302	550	0.0001	-
9.5238	600	0.0001	-
10.3175	650	0.0001	-
11.1111	700	0.0	-
11.9048	750	0.0001	-
12.6984	800	0.0	-
13.4921	850	0.0	-
14.2857	900	0.0	-
15.0794	950	0.0	-
15.8730	1000	0.0	-
16.6667	1050	0.0	-
17.4603	1100	0.0	-
18.2540	1150	0.0001	-
19.0476	1200	0.0	-
19.8413	1250	0.0	-

Framework Versions

Python: 3.10.12
SetFit: 1.1.0.dev0
Sentence Transformers: 3.1.1
Transformers: 4.46.1
PyTorch: 2.4.0+cu121
Datasets: 2.20.0
Tokenizers: 0.20.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

mini1013
/

master_cate_fd16