raynardj
/

xlsearch-cross-lang-search-zh-vs-classicical-cn

Feature Extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

raynardj commited on Nov 29, 2021

Commit

5c4a81e

•

1 Parent(s): 7fa95d1

Update README.md

Files changed (1) hide show

README.md +37 -1

README.md CHANGED Viewed

@@ -14,4 +14,40 @@ tags:
 * This model helps you **find** text within **ancient Chinese** literature, but you can **search with modern Chinese**
 # 跨语种搜索
-## 博古搜今

 * This model helps you **find** text within **ancient Chinese** literature, but you can **search with modern Chinese**
 # 跨语种搜索
+## 博古搜今
+```python
+from unpackai.interp import CosineSearch
+from sentence_transformers import SentenceTransformer
+import pandas as pd
+import numpy as np
+TAG = "raynardj/xlsearch-cross-lang-search-zh-vs-classicical-cn"
+encoder = SentenceTransformer(TAG)
+# all_lines is a list of all your sentences
+# all_lines 是一个你所有句子的列表， 可以是一本书， 按照句子分割， 也可以是很多很多书
+all_lines = ["句子1","句子2",...]
+vec = encoder.encode(all_lines, batch_size=32, show_progress_bar=True)
+# consine距离搜索器
+cosine = CosineSearch(vec)
+def search(text):
+    enc = encoder.encode(text) # encode the search key
+    order = cosine(enc) # distance array
+    sentence_df = pd.DataFrame({"sentence":np.array(all_lines)[order[:5]]})
+    return sentence_df
+```
+将史记打成句子以后， 搜索效果如下
+```python
+>>> search("他是一个很慷慨的人")
+```
+```
+sentence
+0	季布者，楚人也。为气任侠，有名於楚。
+1	董仲舒为人廉直。
+2	大将军为人仁善退让，以和柔自媚於上，然天下未有称也。
+3	勃为人木彊敦厚，高帝以为可属大事。
+4	石奢者，楚昭王相也。坚直廉正，无所阿避。
+```