omarelshehy
commited on
Commit
•
f0ebb68
1
Parent(s):
a9431e5
Update README.md
Browse files
README.md
CHANGED
@@ -76,24 +76,6 @@ The model is trained using the MatryoshkaLoss for embeddings of size 1024, 786,
|
|
76 |
- **Maximum Sequence Length:** 512 tokens
|
77 |
- **Output Dimensionality:** 1024 tokens
|
78 |
- **Similarity Function:** Cosine Similarity
|
79 |
-
<!-- - **Training Dataset:** Unknown -->
|
80 |
-
<!-- - **Language:** Unknown -->
|
81 |
-
<!-- - **License:** Unknown -->
|
82 |
-
|
83 |
-
### Model Sources
|
84 |
-
|
85 |
-
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
86 |
-
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
87 |
-
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
|
88 |
-
|
89 |
-
### Full Model Architecture
|
90 |
-
|
91 |
-
```
|
92 |
-
SentenceTransformer(
|
93 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
|
94 |
-
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
95 |
-
)
|
96 |
-
```
|
97 |
|
98 |
## Usage
|
99 |
|
@@ -114,9 +96,9 @@ matryoshka_dim = 786
|
|
114 |
model = SentenceTransformer("omarelshehy/Arabic-STS-Matryoshka", truncate_dim=matryoshka_dim)
|
115 |
# Run inference
|
116 |
sentences = [
|
117 |
-
'
|
118 |
-
'
|
119 |
-
'
|
120 |
]
|
121 |
embeddings = model.encode(sentences)
|
122 |
print(embeddings.shape)
|
@@ -128,30 +110,6 @@ print(similarities.shape)
|
|
128 |
# [3, 3]
|
129 |
```
|
130 |
|
131 |
-
<!--
|
132 |
-
### Direct Usage (Transformers)
|
133 |
-
|
134 |
-
<details><summary>Click to see the direct usage in Transformers</summary>
|
135 |
-
|
136 |
-
</details>
|
137 |
-
-->
|
138 |
-
|
139 |
-
<!--
|
140 |
-
### Downstream Usage (Sentence Transformers)
|
141 |
-
|
142 |
-
You can finetune this model on your own dataset.
|
143 |
-
|
144 |
-
<details><summary>Click to expand</summary>
|
145 |
-
|
146 |
-
</details>
|
147 |
-
-->
|
148 |
-
|
149 |
-
<!--
|
150 |
-
### Out-of-Scope Use
|
151 |
-
|
152 |
-
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
153 |
-
-->
|
154 |
-
|
155 |
## Evaluation
|
156 |
|
157 |
### Metrics
|
|
|
76 |
- **Maximum Sequence Length:** 512 tokens
|
77 |
- **Output Dimensionality:** 1024 tokens
|
78 |
- **Similarity Function:** Cosine Similarity
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
79 |
|
80 |
## Usage
|
81 |
|
|
|
96 |
model = SentenceTransformer("omarelshehy/Arabic-STS-Matryoshka", truncate_dim=matryoshka_dim)
|
97 |
# Run inference
|
98 |
sentences = [
|
99 |
+
'أحب قراءة الكتب في أوقات فراغي.',
|
100 |
+
'أستمتع بقراءة القصص في المساء قبل النوم.',
|
101 |
+
'القراءة تعزز معرفتي وتفتح أمامي آفاق جديدة.',
|
102 |
]
|
103 |
embeddings = model.encode(sentences)
|
104 |
print(embeddings.shape)
|
|
|
110 |
# [3, 3]
|
111 |
```
|
112 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
113 |
## Evaluation
|
114 |
|
115 |
### Metrics
|