--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - transformers - transformers - transformers - transformers - transformers - transformers - transformers - transformers - transformers - transformers - transformers - transformers - transformers - transformers - transformers - transformers - transformers --- # sentence-transformers/clip-ViT-B-32 This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a None dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed: ``` pip install -U sentence-transformers ``` Then you can use the model like this: ```python from sentence_transformers import SentenceTransformer sentences = ["This is an example sentence", "Each sentence is converted"] model = SentenceTransformer('sentence-transformers/clip-ViT-B-32') embeddings = model.encode(sentences) print(embeddings) ``` ## Evaluation Results For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=sentence-transformers/clip-ViT-B-32) ## Full Model Architecture ``` SentenceTransformer( (0): CLIPModel( (model): CLIP( (visual): VisualTransformer( (conv1): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False) (ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (transformer): Transformer( (resblocks): Sequential( (0): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (1): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (2): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (3): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (4): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (5): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (6): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (7): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (8): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (9): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (10): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (11): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) ) (ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (transformer): Transformer( (resblocks): Sequential( (0): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (1): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (2): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (3): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (4): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (5): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (6): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (7): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (8): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (9): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (10): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (11): ResidualAttentionBlock( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): QuickGELU() (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) ) ) (token_embedding): Embedding(49408, 512) (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) ) ) ``` ## Citing & Authors This model was trained by [sentence-transformers](https://www.sbert.net/). If you find this model helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084): ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "http://arxiv.org/abs/1908.10084", } ```