IEIT-Yuan commited on
Commit
520febe
·
verified ·
1 Parent(s): fed8cab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -1
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  model-index:
3
- - name: Yuan-embedding-1.0
4
  results:
5
  - dataset:
6
  config: default
@@ -1259,3 +1259,41 @@ model-index:
1259
  tags:
1260
  - mteb
1261
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  model-index:
3
+ - name: PLACEHOLDER
4
  results:
5
  - dataset:
6
  config: default
 
1259
  tags:
1260
  - mteb
1261
  ---
1262
+ ## Yuan-embedding-1.0
1263
+
1264
+ Yuan-embedding-1.0是专门为中文文本检索任务设计的嵌入模型。它基于xiaobu-embedding-v2[1],主要改动如下:
1265
+
1266
+ - 在Hard negative sampling中,使用Rerank模型(bge-reranker-large [2])进行数据排序筛选
1267
+
1268
+ - 通过LLM(llama3.1[3])迭代生成新query
1269
+
1270
+ - 基于piccolo-embedding [4]进行训练
1271
+
1272
+
1273
+ ## Usage
1274
+
1275
+ ```bash
1276
+ pip install -U sentence-transformers
1277
+ ```
1278
+
1279
+ 使用示例:
1280
+
1281
+ ```python
1282
+ from sentence_transformers import SentenceTransformer
1283
+
1284
+ model = SentenceTransformer("IEIYuan/Yuan-embedding-1.0")
1285
+ sentences = [
1286
+ "这是一个样例-1",
1287
+ "这是一个样例-2",
1288
+ ]
1289
+ embeddings = model.encode(sentences)
1290
+ similarities = model.similarity(embeddings, embeddings)
1291
+ print(similarities)
1292
+ ```
1293
+
1294
+ ## Reference
1295
+
1296
+ 1. https://huggingface.co/lier007/xiaobu-embedding-v2
1297
+ 2. https://huggingface.co/BAAI/bge-reranker-large
1298
+ 3. https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
1299
+ 4. https://github.com/hjq133/piccolo-embedding