CondViT-B16-cat / README.md

Update README.md

2095254 verified 6 months ago

4.07 kB

	---
	license: cc-by-nc-4.0
	model-index:
	- name: CondViT-B16-cat
	results:
	- dataset:
	name: LAION - Referred Visual Search - Fashion
	split: test
	type: Slep/LAION-RVS-Fashion
	metrics:
	- name: R@1 +10K Dist.
	type: recall_at_1\|10000
	value: 93.44 ± 0.83
	- name: R@5 +10K Dist.
	type: recall_at_5\|10000
	value: 98.07 ± 0.37
	- name: R@10 +10K Dist.
	type: recall_at_10\|10000
	value: 98.69 ± 0.38
	- name: R@20 +10K Dist.
	type: recall_at_20\|10000
	value: 98.98 ± 0.34
	- name: R@50 +10K Dist.
	type: recall_at_50\|10000
	value: 99.55 ± 0.18
	- name: R@1 +100K Dist.
	type: recall_at_1\|100000
	value: 85.90 ± 1.37
	- name: R@5 +100K Dist.
	type: recall_at_5\|100000
	value: 94.22 ± 0.87
	- name: R@10 +100K Dist.
	type: recall_at_10\|100000
	value: 96.04 ± 0.68
	- name: R@20 +100K Dist.
	type: recall_at_20\|100000
	value: 97.18 ± 0.56
	- name: R@50 +100K Dist.
	type: recall_at_50\|100000
	value: 98.28 ± 0.34
	- name: R@1 +500K Dist.
	type: recall_at_1\|500000
	value: 78.19 ± 1.59
	- name: R@5 +500K Dist.
	type: recall_at_5\|500000
	value: 88.70 ± 1.15
	- name: R@10 +500K Dist.
	type: recall_at_10\|500000
	value: 91.46 ± 1.02
	- name: R@20 +500K Dist.
	type: recall_at_20\|500000
	value: 94.07 ± 0.86
	- name: R@50 +500K Dist.
	type: recall_at_50\|500000
	value: 96.11 ± 0.64
	- name: R@1 +1M Dist.
	type: recall_at_1\|1000000
	value: 74.49 ± 1.23
	- name: R@5 +1M Dist.
	type: recall_at_5\|1000000
	value: 85.38 ± 1.29
	- name: R@10 +1M Dist.
	type: recall_at_10\|1000000
	value: 88.95 ± 1.15
	- name: R@20 +1M Dist.
	type: recall_at_20\|1000000
	value: 91.35 ± 0.93
	- name: R@50 +1M Dist.
	type: recall_at_50\|1000000
	value: 94.75 ± 0.75
	- name: Available Dists.
	type: n_dists
	value: 2000014
	- name: Embedding Dimension
	type: embedding_dim
	value: 512
	- name: Conditioning
	type: conditioning
	value: category
	source:
	name: LRVSF Leaderboard
	url: https://huggingface.co/spaces/Slep/LRVSF-Leaderboard
	task:
	type: Retrieval
	tags:
	- lrvsf-benchmark
	datasets:
	- Slep/LAION-RVS-Fashion
	---

	# Conditional ViT - B/16 - Categories

	Introduced in <a href=https://arxiv.org/abs/2306.02928>LRVSF-Fashion: Extending Visual Search with Referring Instructions</a>, Lepage et al. 2023

	<div align="center">
	<div id=links>

	\|Data\|Code\|Models\|Spaces\|
	\|:-:\|:-:\|:-:\|:-:\|
	\|[Full Dataset](https://huggingface.co/datasets/Slep/LAION-RVS-Fashion)\|[Training Code](https://github.com/Simon-Lepage/CondViT-LRVSF)\|[Categorical Model](https://huggingface.co/Slep/CondViT-B16-cat)\|[LRVS-F Leaderboard](https://huggingface.co/spaces/Slep/LRVSF-Leaderboard)\|
	\|[Test set](https://zenodo.org/doi/10.5281/zenodo.11189942)\|[Benchmark Code](https://github.com/Simon-Lepage/LRVSF-Benchmark)\|[Textual Model](https://huggingface.co/Slep/CondViT-B16-txt)\|[Demo](https://huggingface.co/spaces/Slep/CondViT-LRVSF-Demo)\|
	</div>
	</div>

	## General Infos

	Model finetuned from CLIP ViT-B/16 on LRVSF at 224x224. The conditioning categories are the following :
	- Bags
	- Feet
	- Hands
	- Head
	- Lower Body
	- Neck
	- Outwear
	- Upper Body
	- Waist
	- Whole Body

	Research use only.

	## How to Use

	```python
	from PIL import Image
	import requests
	from transformers import AutoProcessor, AutoModel
	import torch

	model = AutoModel.from_pretrained("Slep/CondViT-B16-cat")
	processor = AutoProcessor.from_pretrained("Slep/CondViT-B16-cat")

	url = "https://huggingface.co/datasets/Slep/LAION-RVS-Fashion/resolve/main/assets/108856.0.jpg"
	img = Image.open(requests.get(url, stream=True).raw)
	cat = "Bags"

	inputs = processor(images=[img], categories=[cat])
	raw_embedding = model(**inputs)
	normalized_embedding = torch.nn.functional.normalize(raw_embedding, dim=-1)
	```