hooman650
/

ct2fast-bge-reranker

Model card Files Files and versions Community

hooman650 commited on Nov 16, 2023

Commit

324441d

·

1 Parent(s): 2e4d5be

Update README.md

Files changed (1) hide show

README.md +84 -1

README.md CHANGED Viewed

@@ -1,3 +1,86 @@
 ---
-license: apache-2.0
 ---

 ---
+license: mit
+language:
+- en
+pipeline_tag: text-classification
+tags:
+- medical
+- finance
+- chemistry
+- biology
 ---
+# BGE-Renranker-Large
+<!-- Provide a quick summary of what the model is/does. -->
+This is an `int8` converted version of [bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large). Thanks to `c2translate` this should
+be at least 3 times faster than the original hf transformer version while its smaller with minimal performance loss.
+## Model Details
+Different from embedding model `bge-large-en-v1.5`, reranker uses question and document as input and directly output similarity instead of embedding.
+You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.
+Besides this is highly optimized version using `c2translate` library suitable for production environments.
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+The original model is based on `BAAI` `BGE-Reranker` model. Please visit [bge-reranker-orignal-repo](https://huggingface.co/BAAI/bge-reranker-large)
+for more details.
+## Usage
+Simply `pip install ctranslate2` and then
+```python
+import ctranslate2
+import transformers
+import torch
+device_mapping="cuda" if torch.cuda.is_available() else "cpu"
+model_dir = "hooman650/ct2fast-bge-reranker"
+# ctranslate2 encoder heavy lifting
+encoder = ctranslate2.Encoder(model_dir, device = device_mapping)
+# the classification head comes from HF
+model_name = "BAAI/bge-reranker-large"
+tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
+classifier = transformers.AutoModelForSequenceClassification.from_pretrained(model_name).classifier
+classifier.eval()
+classifier.to(device_mapping)
+pairs = [
+    ["I like Ctranslate2","Ctranslate2 makes mid range models faster"],
+    ["I like Ctranslate2","Using naive transformers might not be suitable for deployment"]
+]
+with torch.no_grad():
+    tokens = tokenizer(pairs, padding=True, truncation=True, max_length=512).input_ids
+    output = encoder.forward_batch(tokens)
+    hidden_state = torch.as_tensor(output.last_hidden_state, device=device_mapping)
+    logits = classifier(hidden_state).squeeze()
+print(logits)
+# tensor([ 1.0474, -9.4694], device='cuda:0')
+```
+#### Hardware
+Supports both GPU and CPU.