SetFit with mini1013/master_domain

This is a SetFit model that can be used for Text Classification. This SetFit model uses mini1013/master_domain as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: mini1013/master_domain
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 12 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
3	'키오시아 EXCERIA PLUS G3 M.2 NVMe 엄지척스토어' '[키오시아] EXCERIA G2 M.2 NVMe (500GB) 주식회사 에티버스이비티' 'ADATA Ultimate SU650 120GB 밀알시스템'
1	'시놀로지 Expansion Unit DX517 (5베이/하드미포함) 타워형 확장 유닛 DS1817+, DS1517+ (주)비엔지센터' '[아이피타임 쇼핑몰] NAS1 dual 1베이 나스 (하드미포함) (주)에이치앤인터내셔널' '시놀로지 정품 나스 DS223 2베이 NAS 스토리지 클라우드 서버 구축 시놀로지 NAS DS223 유심홀릭'
0	'씨게이트 바라쿠다 1TB ST1000DM010 SATA3 64M 1테라 하드 오늘 출발 주식회사 호스트시스템' 'WD BLUE (WD20EZBX) 3.5 SATA HDD (2TB/7200rpm/256MB/SMR) 아이코다(주)' '씨게이트 IronWolf 8TB ST8000VN004 (SATA3/7200/256M) (주)조이젠'
4	'Sandisk Extreme Pro CZ880 (128GB) (주)아이티엔조이' 'Sandisk Cruzer Glide CZ600 (16GB) 컴튜브 주식회사' '샌디스크 울트라 핏 USB 3.1 32GB Ultra Fit CZ430 초소형 주식회사 에스티원테크'
6	'NEXT-DC3011TS 1:11 HDD SSD 스마트 하드복사 삭제기 리벤플러스' '넥시 NX-802RU31 2베이 RAID 데이터 스토리지 하드 도킹스테이션 (NX768) 대성NETWORK' '넥시 USB3.1 C타입 2베이 DAS 데이터 스토리지 NX768 (주)팁스커뮤니케이션즈'
11	'이지넷유비쿼터스 NEXT-215U3 (하드미포함) (주)컴파크씨앤씨' 'ORICO PHP-35 보라 3.5인치 하드 보호케이스 (주)조이젠' '[ORICO] PHP-35 3.5형 하드디스크 보관함 [블루] (주)컴퓨존'
2	'(주)근호컴 [라인업시스템]LS-EXODDC 외장ODD (주)근호컴' '[라인업시스템] LANSTAR LS-BRODD 블루레이 외장ODD 주식회사 에티버스이비티' '넥스트유 NEXT-200DVD-RW USB3.0 DVD-RW 드라이브 ) (주)인컴씨엔에스'
5	'(주)근호컴 [멜로디]1P 투명 연질 CD/DVD 케이스 (10장) (주)근호컴' 'HP CD-R 10P / 52X 700MB / 원통케이스 포장 제품 티앤제이 (T&J) 통상' '엑토 CD롬컨테이너_50매입 CDC-50K /CD보관함/CD케이스/씨디보관함/씨디케이스/cd정리함 CDC-50K 아이보리 솔로몬샵'
9	'시놀로지 비드라이브 BDS70-1T BeeDrive 1TB 외장SSD 개인 백업허브 정품 솔루션 웍스(Solution Works)' 'CORSAIR EX100U Portable SSD Type C (1TB) (주)아이티엔조이' 'ASUS ROG STRIX ARION ESD-S1C M 2 NVMe SSD 외장케이스 (주)아이웍스'
8	'넥스트유 NEXT-651DCU3 도킹스테이션 2베이 (주)수빈인포텍' '이지넷유비쿼터스 넥스트유 659CCU3 도킹 스테이션 주식회사 매커드' '이지넷유비쿼터스 NEXT-644DU3 4베이 도킹스테이션 에이치엠에스'
10	'USB3.0 4베이 DAS 스토리지 NX770 (주)담다몰' '[NEXI] NX-804RU30 외장 케이스 HDD SSD USB 3.0 4베이 하드 도킹스테이션 NX770 주식회사 유진정보통신' '[NEXI] 넥시 NX-804RU30 RAID (4베이) [USB3.0] [NX770] [DAS] [하드미포함] (주)컴퓨존'
7	'USB3.0 하드 도킹스테이션 복제 복사 클론 복사기 HDD SSD 2.5인치 3.5인치 듀얼 외장하드 케이스 Q6GCLONE 퀄리티어슈런스' 'USB3.0 하드 도킹스테이션 복제 복사 클론 복사기 HDD SSD 2.5인치 3.5인치 듀얼 외장하드 케이스 28TB지원 퀄리티어슈런스' 'NEXT 652DCU3 HDD복제기능탑재/도킹스테이션/2.5인치/3.5인치/백업/클론기능 마하링크'

Evaluation

Metrics

Label	Metric
all	0.7786

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("mini1013/master_cate_el16")
# Run inference
preds = model("이지넷 NEXT-350U3 3.5 외장케이스/USB3.0 하드미포함  레알몰")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	4	9.6059	20

Label	Training Sample Count
0	50
1	50
2	50
3	50
4	50
5	50
6	50
7	3
8	50
9	50
10	7
11	50

Training Hyperparameters

batch_size: (512, 512)
num_epochs: (20, 20)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 40
body_learning_rate: (2e-05, 2e-05)
head_learning_rate: 2e-05
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0125	1	0.497	-
0.625	50	0.2348	-
1.25	100	0.0733	-
1.875	150	0.0254	-
2.5	200	0.0165	-
3.125	250	0.0122	-
3.75	300	0.0021	-
4.375	350	0.0024	-
5.0	400	0.001	-
5.625	450	0.0019	-
6.25	500	0.0002	-
6.875	550	0.0007	-
7.5	600	0.0009	-
8.125	650	0.0002	-
8.75	700	0.0002	-
9.375	750	0.0003	-
10.0	800	0.0002	-
10.625	850	0.0002	-
11.25	900	0.0002	-
11.875	950	0.0001	-
12.5	1000	0.0001	-
13.125	1050	0.0001	-
13.75	1100	0.0001	-
14.375	1150	0.0001	-
15.0	1200	0.0001	-
15.625	1250	0.0001	-
16.25	1300	0.0001	-
16.875	1350	0.0001	-
17.5	1400	0.0001	-
18.125	1450	0.0001	-
18.75	1500	0.0001	-
19.375	1550	0.0001	-
20.0	1600	0.0001	-

Framework Versions

Python: 3.10.12
SetFit: 1.1.0.dev0
Sentence Transformers: 3.1.1
Transformers: 4.46.1
PyTorch: 2.4.0+cu121
Datasets: 2.20.0
Tokenizers: 0.20.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

mini1013
/

master_cate_el16