opensearch-project
/

opensearch-neural-sparse-encoding-doc-v1

Model card Files Files and versions

arthurbr11 commited on Jun 10

Commit

cf44a30

·

1 Parent(s): 37f8b09

Update README.md

Files changed (1) hide show

README.md +54 -0

README.md CHANGED Viewed

@@ -9,6 +9,14 @@ tags:
 - passage-retrieval
 - document-expansion
 - bag-of-words
 ---
 # opensearch-neural-sparse-encoding-doc-v1
@@ -36,6 +44,52 @@ This model is trained on MS MARCO dataset.
 OpenSearch neural sparse feature supports learned sparse retrieval with lucene inverted index. Link: https://opensearch.org/docs/latest/query-dsl/specialized/neural-sparse/. The indexing and search can be performed with OpenSearch high-level API.
 ## Usage (HuggingFace)
 This model is supposed to run inside OpenSearch cluster. But you can also use it outside the cluster, with HuggingFace models API.

 - passage-retrieval
 - document-expansion
 - bag-of-words
+- sentence-transformers
+- sparse-encoder
+- sparse
+- asymmetric
+- inference-free
+- splade
+pipeline_tag: feature-extraction
+library_name: sentence-transformers
 ---
 # opensearch-neural-sparse-encoding-doc-v1
 OpenSearch neural sparse feature supports learned sparse retrieval with lucene inverted index. Link: https://opensearch.org/docs/latest/query-dsl/specialized/neural-sparse/. The indexing and search can be performed with OpenSearch high-level API.
+## Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers.sparse_encoder import SparseEncoder
+# Download from the 🤗 Hub
+model = SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v1")
+query = "What's the weather in ny now?"
+document = "Currently New York is rainy."
+query_embed = model.encode_query(query)
+document_embed = model.encode_document(document)
+sim = model.similarity(query_embed, document_embed)
+print(f"Similarity: {sim}")
+# Similarity: tensor([[12.8465]])
+# Visualize top tokens for each text
+top_k = 3
+print(f"\nTop tokens {top_k} for each text:")
+decoded_query = model.decode(query_embed, top_k=top_k)
+decoded_document = model.decode(document_embed)
+for i in range(top_k):
+    query_token, query_score = decoded_query[i]
+    doc_score = next((score for token, score in decoded_document if token == query_token), 0)
+    if doc_score != 0:
+        print(f"Token: {query_token}, Query score: {query_score:.4f}, Document score: {doc_score:.4f}")
+# Top tokens 3 for each text:
+# Token: ny, Query score: 5.7729, Document score: 1.0552
+# Token: weather, Query score: 4.5684, Document score: 1.1697
+# Token: now, Query score: 3.5895, Document score: 0.3932
+```
 ## Usage (HuggingFace)
 This model is supposed to run inside OpenSearch cluster. But you can also use it outside the cluster, with HuggingFace models API.