Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
---
|
2 |
model-index:
|
3 |
-
- name:
|
4 |
results:
|
5 |
- dataset:
|
6 |
config: default
|
@@ -1259,3 +1259,41 @@ model-index:
|
|
1259 |
tags:
|
1260 |
- mteb
|
1261 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
model-index:
|
3 |
+
- name: PLACEHOLDER
|
4 |
results:
|
5 |
- dataset:
|
6 |
config: default
|
|
|
1259 |
tags:
|
1260 |
- mteb
|
1261 |
---
|
1262 |
+
## Yuan-embedding-1.0
|
1263 |
+
|
1264 |
+
Yuan-embedding-1.0是专门为中文文本检索任务设计的嵌入模型。它基于xiaobu-embedding-v2[1],主要改动如下:
|
1265 |
+
|
1266 |
+
- 在Hard negative sampling中,使用Rerank模型(bge-reranker-large [2])进行数据排序筛选
|
1267 |
+
|
1268 |
+
- 通过LLM(llama3.1[3])迭代生成新query
|
1269 |
+
|
1270 |
+
- 基于piccolo-embedding [4]进行训练
|
1271 |
+
|
1272 |
+
|
1273 |
+
## Usage
|
1274 |
+
|
1275 |
+
```bash
|
1276 |
+
pip install -U sentence-transformers
|
1277 |
+
```
|
1278 |
+
|
1279 |
+
使用示例:
|
1280 |
+
|
1281 |
+
```python
|
1282 |
+
from sentence_transformers import SentenceTransformer
|
1283 |
+
|
1284 |
+
model = SentenceTransformer("IEIYuan/Yuan-embedding-1.0")
|
1285 |
+
sentences = [
|
1286 |
+
"这是一个样例-1",
|
1287 |
+
"这是一个样例-2",
|
1288 |
+
]
|
1289 |
+
embeddings = model.encode(sentences)
|
1290 |
+
similarities = model.similarity(embeddings, embeddings)
|
1291 |
+
print(similarities)
|
1292 |
+
```
|
1293 |
+
|
1294 |
+
## Reference
|
1295 |
+
|
1296 |
+
1. https://huggingface.co/lier007/xiaobu-embedding-v2
|
1297 |
+
2. https://huggingface.co/BAAI/bge-reranker-large
|
1298 |
+
3. https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
|
1299 |
+
4. https://github.com/hjq133/piccolo-embedding
|