File size: 4,072 Bytes
2095254 3190ebb 2095254 3190ebb 2095254 3190ebb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
license: cc-by-nc-4.0
model-index:
- name: CondViT-B16-cat
results:
- dataset:
name: LAION - Referred Visual Search - Fashion
split: test
type: Slep/LAION-RVS-Fashion
metrics:
- name: R@1 +10K Dist.
type: recall_at_1|10000
value: 93.44 ± 0.83
- name: R@5 +10K Dist.
type: recall_at_5|10000
value: 98.07 ± 0.37
- name: R@10 +10K Dist.
type: recall_at_10|10000
value: 98.69 ± 0.38
- name: R@20 +10K Dist.
type: recall_at_20|10000
value: 98.98 ± 0.34
- name: R@50 +10K Dist.
type: recall_at_50|10000
value: 99.55 ± 0.18
- name: R@1 +100K Dist.
type: recall_at_1|100000
value: 85.90 ± 1.37
- name: R@5 +100K Dist.
type: recall_at_5|100000
value: 94.22 ± 0.87
- name: R@10 +100K Dist.
type: recall_at_10|100000
value: 96.04 ± 0.68
- name: R@20 +100K Dist.
type: recall_at_20|100000
value: 97.18 ± 0.56
- name: R@50 +100K Dist.
type: recall_at_50|100000
value: 98.28 ± 0.34
- name: R@1 +500K Dist.
type: recall_at_1|500000
value: 78.19 ± 1.59
- name: R@5 +500K Dist.
type: recall_at_5|500000
value: 88.70 ± 1.15
- name: R@10 +500K Dist.
type: recall_at_10|500000
value: 91.46 ± 1.02
- name: R@20 +500K Dist.
type: recall_at_20|500000
value: 94.07 ± 0.86
- name: R@50 +500K Dist.
type: recall_at_50|500000
value: 96.11 ± 0.64
- name: R@1 +1M Dist.
type: recall_at_1|1000000
value: 74.49 ± 1.23
- name: R@5 +1M Dist.
type: recall_at_5|1000000
value: 85.38 ± 1.29
- name: R@10 +1M Dist.
type: recall_at_10|1000000
value: 88.95 ± 1.15
- name: R@20 +1M Dist.
type: recall_at_20|1000000
value: 91.35 ± 0.93
- name: R@50 +1M Dist.
type: recall_at_50|1000000
value: 94.75 ± 0.75
- name: Available Dists.
type: n_dists
value: 2000014
- name: Embedding Dimension
type: embedding_dim
value: 512
- name: Conditioning
type: conditioning
value: category
source:
name: LRVSF Leaderboard
url: https://huggingface.co/spaces/Slep/LRVSF-Leaderboard
task:
type: Retrieval
tags:
- lrvsf-benchmark
datasets:
- Slep/LAION-RVS-Fashion
---
# Conditional ViT - B/16 - Categories
*Introduced in <a href=https://arxiv.org/abs/2306.02928>**LRVSF-Fashion: Extending Visual Search with Referring Instructions**</a>, Lepage et al. 2023*
<div align="center">
<div id=links>
|Data|Code|Models|Spaces|
|:-:|:-:|:-:|:-:|
|[Full Dataset](https://huggingface.co/datasets/Slep/LAION-RVS-Fashion)|[Training Code](https://github.com/Simon-Lepage/CondViT-LRVSF)|[Categorical Model](https://huggingface.co/Slep/CondViT-B16-cat)|[LRVS-F Leaderboard](https://huggingface.co/spaces/Slep/LRVSF-Leaderboard)|
|[Test set](https://zenodo.org/doi/10.5281/zenodo.11189942)|[Benchmark Code](https://github.com/Simon-Lepage/LRVSF-Benchmark)|[Textual Model](https://huggingface.co/Slep/CondViT-B16-txt)|[Demo](https://huggingface.co/spaces/Slep/CondViT-LRVSF-Demo)|
</div>
</div>
## General Infos
Model finetuned from CLIP ViT-B/16 on LRVSF at 224x224. The conditioning categories are the following :
- Bags
- Feet
- Hands
- Head
- Lower Body
- Neck
- Outwear
- Upper Body
- Waist
- Whole Body
Research use only.
## How to Use
```python
from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch
model = AutoModel.from_pretrained("Slep/CondViT-B16-cat")
processor = AutoProcessor.from_pretrained("Slep/CondViT-B16-cat")
url = "https://huggingface.co/datasets/Slep/LAION-RVS-Fashion/resolve/main/assets/108856.0.jpg"
img = Image.open(requests.get(url, stream=True).raw)
cat = "Bags"
inputs = processor(images=[img], categories=[cat])
raw_embedding = model(**inputs)
normalized_embedding = torch.nn.functional.normalize(raw_embedding, dim=-1)
``` |