shibing624
commited on
Commit
·
11ae34a
1
Parent(s):
542bce3
Update README.md
Browse files
README.md
CHANGED
@@ -126,10 +126,34 @@ print(sentence_embeddings)
|
|
126 |
## Full Model Architecture
|
127 |
```
|
128 |
CoSENT(
|
129 |
-
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model:
|
130 |
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_mean_tokens': True})
|
131 |
)
|
132 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
133 |
## Citing & Authors
|
134 |
This model was trained by [text2vec](https://github.com/shibing624/text2vec).
|
135 |
|
|
|
126 |
## Full Model Architecture
|
127 |
```
|
128 |
CoSENT(
|
129 |
+
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: ErnieModel
|
130 |
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_mean_tokens': True})
|
131 |
)
|
132 |
```
|
133 |
+
|
134 |
+
|
135 |
+
## Intended uses
|
136 |
+
|
137 |
+
Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures
|
138 |
+
the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks.
|
139 |
+
|
140 |
+
By default, input text longer than 256 word pieces is truncated.
|
141 |
+
|
142 |
+
|
143 |
+
## Training procedure
|
144 |
+
|
145 |
+
### Pre-training
|
146 |
+
|
147 |
+
We use the pretrained [`nghuyong/ernie-3.0-base-zh`](https://huggingface.co/nghuyong/ernie-3.0-base-zh) model.
|
148 |
+
Please refer to the model card for more detailed information about the pre-training procedure.
|
149 |
+
|
150 |
+
### Fine-tuning
|
151 |
+
|
152 |
+
We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each
|
153 |
+
possible sentence pairs from the batch.
|
154 |
+
We then apply the rank loss by comparing with true pairs and false pairs.
|
155 |
+
|
156 |
+
|
157 |
## Citing & Authors
|
158 |
This model was trained by [text2vec](https://github.com/shibing624/text2vec).
|
159 |
|