Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ Disclaimer: The team releasing DINOv2 did not write a model card for this model
|
|
13 |
|
14 |
## Model description
|
15 |
|
16 |
-
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion
|
17 |
|
18 |
Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder.
|
19 |
|
|
|
13 |
|
14 |
## Model description
|
15 |
|
16 |
+
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion.
|
17 |
|
18 |
Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder.
|
19 |
|