d1mitriz
commited on
Commit
·
10df9cf
1
Parent(s):
22a6274
Fixed model card, try 5
Browse files
README.md
CHANGED
@@ -14,20 +14,26 @@ metrics:
|
|
14 |
model-index:
|
15 |
- name: st-greek-media-bert-base-uncased
|
16 |
results: [
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
|
|
|
|
|
|
|
|
|
|
27 |
]
|
28 |
---
|
29 |
|
30 |
sentence_transformers.losses.TripletLoss.TripletLoss` with parameters:
|
|
|
31 |
```
|
32 |
{'distance_metric': 'TripletDistanceMetric.EUCLIDEAN', 'triplet_margin': 5}
|
33 |
|
@@ -43,7 +49,9 @@ This is a [sentence-transformers](https://www.SBERT.net) based on the [Greek Med
|
|
43 |
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
|
44 |
|
45 |
```
|
|
|
46 |
pip install -U sentence-transformers
|
|
|
47 |
```
|
48 |
|
49 |
Then you can use the model like this:
|
@@ -57,19 +65,20 @@ embeddings = model.encode(sentences)
|
|
57 |
print(embeddings)
|
58 |
```
|
59 |
|
60 |
-
|
61 |
-
|
62 |
## Usage (HuggingFace Transformers)
|
63 |
-
|
|
|
|
|
|
|
64 |
|
65 |
```python
|
66 |
from transformers import AutoTokenizer, AutoModel
|
67 |
import torch
|
68 |
|
69 |
|
70 |
-
#Mean Pooling - Take attention mask into account for correct averaging
|
71 |
def mean_pooling(model_output, attention_mask):
|
72 |
-
token_embeddings = model_output[0]
|
73 |
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
|
74 |
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
|
75 |
|
@@ -95,20 +104,23 @@ print("Sentence embeddings:")
|
|
95 |
print(sentence_embeddings)
|
96 |
```
|
97 |
|
98 |
-
|
99 |
-
|
100 |
## Evaluation Results
|
101 |
|
102 |
<!--- Describe how your model was evaluated -->
|
103 |
|
104 |
-
For an automated evaluation of this model, see the *Sentence Embeddings
|
105 |
-
|
106 |
|
107 |
## Training
|
108 |
-
|
109 |
-
|
110 |
-
|
111 |
-
|
|
|
|
|
|
|
|
|
|
|
112 |
- The model was trained on a single NVIDIA RTX A6000 GPU with 48GB of memory.
|
113 |
|
114 |
The model was trained with the parameters:
|
@@ -116,6 +128,7 @@ The model was trained with the parameters:
|
|
116 |
**DataLoader**:
|
117 |
|
118 |
`torch.utils.data.dataloader.DataLoader` of length 10807 with parameters:
|
|
|
119 |
```
|
120 |
{'batch_size': 16, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
|
121 |
```
|
@@ -123,11 +136,13 @@ The model was trained with the parameters:
|
|
123 |
**Loss**:
|
124 |
|
125 |
`sentence_transformers.losses.TripletLoss.TripletLoss` with parameters:
|
|
|
126 |
```
|
127 |
{'distance_metric': 'TripletDistanceMetric.EUCLIDEAN', 'triplet_margin': 5}
|
128 |
```
|
129 |
|
130 |
Parameters of the fit()-Method:
|
|
|
131 |
```
|
132 |
{
|
133 |
"epochs": 3,
|
@@ -145,8 +160,8 @@ Parameters of the fit()-Method:
|
|
145 |
}
|
146 |
```
|
147 |
|
148 |
-
|
149 |
## Full Model Architecture
|
|
|
150 |
```
|
151 |
SentenceTransformer(
|
152 |
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
|
|
14 |
model-index:
|
15 |
- name: st-greek-media-bert-base-uncased
|
16 |
results: [
|
17 |
+
{
|
18 |
+
"task": {
|
19 |
+
"name": "STS Benchmark",
|
20 |
+
"type": "sentence-similarity"
|
21 |
+
},
|
22 |
+
"metrics": {
|
23 |
+
"accuracy_cosinus": 0.9563965089445283,
|
24 |
+
"accuracy_euclidean": 0.9566394253292384,
|
25 |
+
"accuracy_manhattan": 0.9565353183072198
|
26 |
+
},
|
27 |
+
"dataset": {
|
28 |
+
"name": "all_custom_greek_media_triplets",
|
29 |
+
"type": "sentence-pair"
|
30 |
+
},
|
31 |
+
}
|
32 |
]
|
33 |
---
|
34 |
|
35 |
sentence_transformers.losses.TripletLoss.TripletLoss` with parameters:
|
36 |
+
|
37 |
```
|
38 |
{'distance_metric': 'TripletDistanceMetric.EUCLIDEAN', 'triplet_margin': 5}
|
39 |
|
|
|
49 |
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
|
50 |
|
51 |
```
|
52 |
+
|
53 |
pip install -U sentence-transformers
|
54 |
+
|
55 |
```
|
56 |
|
57 |
Then you can use the model like this:
|
|
|
65 |
print(embeddings)
|
66 |
```
|
67 |
|
|
|
|
|
68 |
## Usage (HuggingFace Transformers)
|
69 |
+
|
70 |
+
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input
|
71 |
+
through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word
|
72 |
+
embeddings.
|
73 |
|
74 |
```python
|
75 |
from transformers import AutoTokenizer, AutoModel
|
76 |
import torch
|
77 |
|
78 |
|
79 |
+
# Mean Pooling - Take attention mask into account for correct averaging
|
80 |
def mean_pooling(model_output, attention_mask):
|
81 |
+
token_embeddings = model_output[0] # First element of model_output contains all token embeddings
|
82 |
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
|
83 |
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
|
84 |
|
|
|
104 |
print(sentence_embeddings)
|
105 |
```
|
106 |
|
|
|
|
|
107 |
## Evaluation Results
|
108 |
|
109 |
<!--- Describe how your model was evaluated -->
|
110 |
|
111 |
+
For an automated evaluation of this model, see the *Sentence Embeddings
|
112 |
+
Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=dimitriz/st-greek-media-bert-base-uncased)
|
113 |
|
114 |
## Training
|
115 |
+
|
116 |
+
The model was trained on a custom dataset containing triplets from the **combined** Greek 'internet', 'social-media'
|
117 |
+
and 'press' domains, described in the paper [DACL](https://...).
|
118 |
+
|
119 |
+
- The dataset was created by sampling triplets of sentences from the same domain, where the first two sentences are more
|
120 |
+
similar than the third one.
|
121 |
+
- Training objective was to maximize the similarity between the first two sentences and minimize the similarity between
|
122 |
+
the first and the third sentence.
|
123 |
+
- The model was trained for 3 epochs with a batch size of 16 and a maximum sequence length of 512 tokens.
|
124 |
- The model was trained on a single NVIDIA RTX A6000 GPU with 48GB of memory.
|
125 |
|
126 |
The model was trained with the parameters:
|
|
|
128 |
**DataLoader**:
|
129 |
|
130 |
`torch.utils.data.dataloader.DataLoader` of length 10807 with parameters:
|
131 |
+
|
132 |
```
|
133 |
{'batch_size': 16, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
|
134 |
```
|
|
|
136 |
**Loss**:
|
137 |
|
138 |
`sentence_transformers.losses.TripletLoss.TripletLoss` with parameters:
|
139 |
+
|
140 |
```
|
141 |
{'distance_metric': 'TripletDistanceMetric.EUCLIDEAN', 'triplet_margin': 5}
|
142 |
```
|
143 |
|
144 |
Parameters of the fit()-Method:
|
145 |
+
|
146 |
```
|
147 |
{
|
148 |
"epochs": 3,
|
|
|
160 |
}
|
161 |
```
|
162 |
|
|
|
163 |
## Full Model Architecture
|
164 |
+
|
165 |
```
|
166 |
SentenceTransformer(
|
167 |
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|