RevenueStreamJP / README.md

Add SetFit model

a289cf2 verified about 1 year ago

14.6 kB

	---
	library_name: setfit
	tags:
	- setfit
	- sentence-transformers
	- text-classification
	- generated_from_setfit_trainer
	metrics:
	- accuracy
	widget:
	- text: '首都圏最大級の店舗物件専門情報サービス会社です。

	次々に出店する店舗さま達と一緒に成長しましょう！


	あの店も、この店も、ホクトシステムで出店したんだ！と街中を歩くのが楽しくなります( ^ω^ )'
	- text: 'けやき出版は、多摩のひととまちをつないでいくという企業理念のもと、書籍・情報誌・パンフレット・会社案内・社史・ロゴ制作やWEBサイトの記事制作などの仕事をさせていただいています。


	2020年6月には、新しい多摩の情報誌「BALL.」を創刊（年２回発行）し、多摩エリアではずむように働こう！というタグラインのもと、多摩エリアの仕事に特化した内容を読者の方に届けています。

	BALL.を中心に始まった、WEB MAGAZINEなど、クリエイター自らの企画参加型のメディアを形成しています。

	'
	- text: '私たちの存在意義（Purpose）は、「利他であふれる社会を創る」です。

	利他とは、他人に利益を与えること。自分の事よりも他人の幸福を願うこと。シンプルに言うと「求め合うより与え合う」そんな表現がぴったりかもしれません。


	私たちの夢は、世界中のすべての人たちが夢と勇気と笑顔に溢れた社会を創ることです。そのためには「利他の精神」は必要不可欠です。

	誰かに期待するのではなく、自ら利他の精神を持ち「どうすれば社会や他人を幸せにすることができるのだろう」を日々考え行動し、社会を良くしていきます。'
	- text: '「ラテンアメリカと日本の新しい歴史を創り、人々の人生を豊かにする」ことを理念に、メキシコ合衆国を中心に事業を展開しています！現在はメキシコ合衆国を中心に、メキシコ人・日本人およびセルビア人の合計80名で活動しています。


	主な事業内容は以下の通りです。


	①広告代理店、ならびに各種コンサルティング事業 (企画・営業部)


	②ラテンアメリカ域内における日本食レストランの運営事業 (Food & Beverage 事業部)


	'
	- text: 次世代を担う子どもたちへプログラミングの面白さを伝えるキッズプログラミングスクール「ツクル」を運営するスタートアップ。
	pipeline_tag: text-classification
	inference: true
	---

	# SetFit

	This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.

	The model has been trained using an efficient few-shot learning technique that involves:

	1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
	2. Training a classification head with features from the fine-tuned Sentence Transformer.

	## Model Details

	### Model Description
	- Model Type: SetFit
	<!-- - Sentence Transformer: [Unknown](https://huggingface.co/unknown) -->
	- Classification head: a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
	- Maximum Sequence Length: 512 tokens
	- Number of Classes: 2 classes
	<!-- - Training Dataset: [Unknown](https://huggingface.co/datasets/unknown) -->
	<!-- - Language: Unknown -->
	<!-- - License: Unknown -->

	### Model Sources

	- Repository: [SetFit on GitHub](https://github.com/huggingface/setfit)
	- Paper: [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
	- Blogpost: [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)

	### Model Labels
	\| Label \| Examples \|
	\|:--------------\|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| Non-recurring \| <ul><li>'海外に拠点を置くセカンダリー・プライベート・エクイティ投資に特化した独立系運用会社。'</li><li>'我々映像機器システム社は、大正11年から続く長い歴史のある企業です。\n「夢と感動を皆様にお届けする」という信念は今も昔も、これからも変わることはありません。\n\n我々の仕事を通して、エンドユーザーには想像以上の映像・音響体験が生まれ、感謝と感動が循環します。\nまるで自分も映像の中に入り込んだように、全身で作品を味わっていただいています。\n\nただ「映画を観る」だけではなく、「映画を体験」し、その先の感動をお届けし続ける企業であり続けたいと思っています！'</li><li>'報道関係者向けイベントのオンライン開催を支援する動画配信サービス「プレスメイク」などを運営するスタートアップ。'</li></ul> \|
	\| Recurring \| <ul><li>'お米・麦・大豆を使い、\n飲食業に特化した今までにない\n飲食業態を開発運営する会社です。\n\nキャッチコピーは「お米をデザインする」\n\n自社農園での農業から加工・販売まで\n\n育てる・作る・販売するを一貫して行なっており、\n一次産業から三次産業まで\n全てのシーンでお米をデザインしながら、\n日本の食文化で常に\n新しいチャレンジをしています。'</li><li>'アニメ評価ランキングサイト「あにこれ」を運営するスタートアップ。'</li><li>'100人いれば、100通りの美しさがあり、100通りのらしさがある。\n創業以来ずっと、私たちは患者様一人ひとりと向き合い、\n患者さまの立場に立った施術を行うことを信念としてやってきました。\n例えば、カウンセラーではなく医師が時間をかけて患者様と向き合って\nカウンセリングしているのもそのスタンスを実現するためです。\n\n一人一人にクオリティの高い治療を行うために。\n最新技術の研鑽はもちろんのこと、チームワークを大切にしながら、\n美容医療をいかに進化させることができるかを真剣に学べる環境です。'</li></ul> \|

	## Uses

	### Direct Use for Inference

	First install the SetFit library:

	```bash
	pip install setfit
	```

	Then you can load this model and run inference.

	```python
	from setfit import SetFitModel

	# Download from the 🤗 Hub
	model = SetFitModel.from_pretrained("Kurrant/RevenueStreamJP")
	# Run inference
	preds = model("次世代を担う子どもたちへプログラミングの面白さを伝えるキッズプログラミングスクール「ツクル」を運営するスタートアップ。")
	```

	<!--
	### Downstream Use

	List how someone could finetune this model on their own dataset.
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Set Metrics
	\| Training set \| Min \| Median \| Max \|
	\|:-------------\|:----\|:-------\|:----\|
	\| Word count \| 1 \| 2.0785 \| 65 \|

	\| Label \| Training Sample Count \|
	\|:--------------\|:----------------------\|
	\| Non-recurring \| 929 \|
	\| Recurring \| 1467 \|

	### Training Hyperparameters
	- batch_size: (8, 8)
	- num_epochs: (2, 2)
	- max_steps: -1
	- sampling_strategy: oversampling
	- num_iterations: 3
	- body_learning_rate: (2e-05, 2e-05)
	- head_learning_rate: 2e-05
	- loss: CosineSimilarityLoss
	- distance_metric: cosine_distance
	- margin: 0.25
	- end_to_end: False
	- use_amp: False
	- warmup_proportion: 0.1
	- seed: 42
	- eval_max_steps: -1
	- load_best_model_at_end: False

	### Training Results
	\| Epoch \| Step \| Training Loss \| Validation Loss \|
	\|:------:\|:----:\|:-------------:\|:---------------:\|
	\| 0.0006 \| 1 \| 0.3 \| - \|
	\| 0.0278 \| 50 \| 0.2769 \| - \|
	\| 0.0556 \| 100 \| 0.2192 \| - \|
	\| 0.0835 \| 150 \| 0.323 \| - \|
	\| 0.1113 \| 200 \| 0.2692 \| - \|
	\| 0.1391 \| 250 \| 0.1603 \| - \|
	\| 0.1669 \| 300 \| 0.3578 \| - \|
	\| 0.1948 \| 350 \| 0.197 \| - \|
	\| 0.2226 \| 400 \| 0.3582 \| - \|
	\| 0.2504 \| 450 \| 0.2184 \| - \|
	\| 0.2782 \| 500 \| 0.182 \| - \|
	\| 0.3061 \| 550 \| 0.2353 \| - \|
	\| 0.3339 \| 600 \| 0.2287 \| - \|
	\| 0.3617 \| 650 \| 0.1228 \| - \|
	\| 0.3895 \| 700 \| 0.2276 \| - \|
	\| 0.4174 \| 750 \| 0.2181 \| - \|
	\| 0.4452 \| 800 \| 0.2857 \| - \|
	\| 0.4730 \| 850 \| 0.2361 \| - \|
	\| 0.5008 \| 900 \| 0.2545 \| - \|
	\| 0.5287 \| 950 \| 0.1986 \| - \|
	\| 0.5565 \| 1000 \| 0.3308 \| - \|
	\| 0.5843 \| 1050 \| 0.2126 \| - \|
	\| 0.6121 \| 1100 \| 0.18 \| - \|
	\| 0.6400 \| 1150 \| 0.1206 \| - \|
	\| 0.6678 \| 1200 \| 0.1441 \| - \|
	\| 0.6956 \| 1250 \| 0.1999 \| - \|
	\| 0.7234 \| 1300 \| 0.1518 \| - \|
	\| 0.7513 \| 1350 \| 0.1713 \| - \|
	\| 0.7791 \| 1400 \| 0.033 \| - \|
	\| 0.8069 \| 1450 \| 0.1999 \| - \|
	\| 0.8347 \| 1500 \| 0.0766 \| - \|
	\| 0.8625 \| 1550 \| 0.1551 \| - \|
	\| 0.8904 \| 1600 \| 0.363 \| - \|
	\| 0.9182 \| 1650 \| 0.0398 \| - \|
	\| 0.9460 \| 1700 \| 0.1047 \| - \|
	\| 0.9738 \| 1750 \| 0.0475 \| - \|
	\| 1.0017 \| 1800 \| 0.0331 \| - \|
	\| 1.0295 \| 1850 \| 0.0113 \| - \|
	\| 1.0573 \| 1900 \| 0.0099 \| - \|
	\| 1.0851 \| 1950 \| 0.2228 \| - \|
	\| 1.1130 \| 2000 \| 0.1168 \| - \|
	\| 1.1408 \| 2050 \| 0.0687 \| - \|
	\| 1.1686 \| 2100 \| 0.0018 \| - \|
	\| 1.1964 \| 2150 \| 0.0043 \| - \|
	\| 1.2243 \| 2200 \| 0.0016 \| - \|
	\| 1.2521 \| 2250 \| 0.0488 \| - \|
	\| 1.2799 \| 2300 \| 0.0029 \| - \|
	\| 1.3077 \| 2350 \| 0.0053 \| - \|
	\| 1.3356 \| 2400 \| 0.0659 \| - \|
	\| 1.3634 \| 2450 \| 0.0662 \| - \|
	\| 1.3912 \| 2500 \| 0.0013 \| - \|
	\| 1.4190 \| 2550 \| 0.1195 \| - \|
	\| 1.4469 \| 2600 \| 0.0004 \| - \|
	\| 1.4747 \| 2650 \| 0.0028 \| - \|
	\| 1.5025 \| 2700 \| 0.0002 \| - \|
	\| 1.5303 \| 2750 \| 0.2196 \| - \|
	\| 1.5582 \| 2800 \| 0.0011 \| - \|
	\| 1.5860 \| 2850 \| 0.0086 \| - \|
	\| 1.6138 \| 2900 \| 0.0017 \| - \|
	\| 1.6416 \| 2950 \| 0.0048 \| - \|
	\| 1.6694 \| 3000 \| 0.0003 \| - \|
	\| 1.6973 \| 3050 \| 0.0003 \| - \|
	\| 1.7251 \| 3100 \| 0.0002 \| - \|
	\| 1.7529 \| 3150 \| 0.0002 \| - \|
	\| 1.7807 \| 3200 \| 0.0003 \| - \|
	\| 1.8086 \| 3250 \| 0.0001 \| - \|
	\| 1.8364 \| 3300 \| 0.0002 \| - \|
	\| 1.8642 \| 3350 \| 0.0133 \| - \|
	\| 1.8920 \| 3400 \| 0.0003 \| - \|
	\| 1.9199 \| 3450 \| 0.0003 \| - \|
	\| 1.9477 \| 3500 \| 0.0007 \| - \|
	\| 1.9755 \| 3550 \| 0.0005 \| - \|

	### Framework Versions
	- Python: 3.10.12
	- SetFit: 1.0.2
	- Sentence Transformers: 2.2.2
	- Transformers: 4.35.2
	- PyTorch: 2.1.0+cu121
	- Datasets: 2.16.1
	- Tokenizers: 0.15.0

	## Citation

	### BibTeX
	```bibtex
	@article{https://doi.org/10.48550/arxiv.2209.11055,
	doi = {10.48550/ARXIV.2209.11055},
	url = {https://arxiv.org/abs/2209.11055},
	author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
	keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
	title = {Efficient Few-Shot Learning Without Prompts},
	publisher = {arXiv},
	year = {2022},
	copyright = {Creative Commons Attribution 4.0 International}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->