|
--- |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- feature-extraction |
|
- sentence-similarity |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
- transformers |
|
--- |
|
|
|
# sentence-transformers/clip-ViT-B-32 |
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a None dimensional dense vector space and can be used for tasks like clustering or semantic search. |
|
|
|
|
|
|
|
## Usage (Sentence-Transformers) |
|
|
|
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed: |
|
|
|
``` |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can use the model like this: |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
sentences = ["This is an example sentence", "Each sentence is converted"] |
|
|
|
model = SentenceTransformer('sentence-transformers/clip-ViT-B-32') |
|
embeddings = model.encode(sentences) |
|
print(embeddings) |
|
``` |
|
|
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
|
|
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=sentence-transformers/clip-ViT-B-32) |
|
|
|
|
|
|
|
## Full Model Architecture |
|
``` |
|
SentenceTransformer( |
|
(0): CLIPModel( |
|
(model): CLIP( |
|
(visual): VisualTransformer( |
|
(conv1): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False) |
|
(ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(transformer): Transformer( |
|
(resblocks): Sequential( |
|
(0): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(1): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(2): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(3): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(4): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(5): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(6): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(7): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(8): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(9): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(10): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(11): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
) |
|
) |
|
(ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(transformer): Transformer( |
|
(resblocks): Sequential( |
|
(0): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) |
|
) |
|
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=512, out_features=2048, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=2048, out_features=512, bias=True) |
|
) |
|
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(1): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) |
|
) |
|
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=512, out_features=2048, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=2048, out_features=512, bias=True) |
|
) |
|
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(2): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) |
|
) |
|
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=512, out_features=2048, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=2048, out_features=512, bias=True) |
|
) |
|
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(3): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) |
|
) |
|
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=512, out_features=2048, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=2048, out_features=512, bias=True) |
|
) |
|
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(4): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) |
|
) |
|
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=512, out_features=2048, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=2048, out_features=512, bias=True) |
|
) |
|
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(5): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) |
|
) |
|
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=512, out_features=2048, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=2048, out_features=512, bias=True) |
|
) |
|
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(6): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) |
|
) |
|
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=512, out_features=2048, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=2048, out_features=512, bias=True) |
|
) |
|
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(7): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) |
|
) |
|
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=512, out_features=2048, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=2048, out_features=512, bias=True) |
|
) |
|
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(8): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) |
|
) |
|
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=512, out_features=2048, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=2048, out_features=512, bias=True) |
|
) |
|
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(9): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) |
|
) |
|
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=512, out_features=2048, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=2048, out_features=512, bias=True) |
|
) |
|
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(10): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) |
|
) |
|
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=512, out_features=2048, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=2048, out_features=512, bias=True) |
|
) |
|
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
(11): ResidualAttentionBlock( |
|
(attn): MultiheadAttention( |
|
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) |
|
) |
|
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
(mlp): Sequential( |
|
(c_fc): Linear(in_features=512, out_features=2048, bias=True) |
|
(gelu): QuickGELU() |
|
(c_proj): Linear(in_features=2048, out_features=512, bias=True) |
|
) |
|
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
) |
|
) |
|
(token_embedding): Embedding(49408, 512) |
|
(ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True) |
|
) |
|
) |
|
) |
|
``` |
|
|
|
## Citing & Authors |
|
|
|
This model was trained by [sentence-transformers](https://www.sbert.net/). |
|
|
|
If you find this model helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084): |
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
month = "11", |
|
year = "2019", |
|
publisher = "Association for Computational Linguistics", |
|
url = "http://arxiv.org/abs/1908.10084", |
|
} |
|
``` |