File size: 2,381 Bytes
2e4d5be
324441d
 
 
 
 
 
 
 
2e4d5be
f5a5186
 
324441d
 
 
 
 
 
 
f5a5186
 
324441d
 
 
 
 
f5a5186
324441d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
license: mit
language:
- en
tags:
- medical
- finance
- chemistry
- biology
---
![BGE-reranking](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*tCBbIjV_jLZP1AKLTX7rAw.png)

# BGE-Renranker-Large

<!-- Provide a quick summary of what the model is/does. -->

This is an `int8` converted version of [bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large). Thanks to `c2translate` this should
be at least 3 times faster than the original hf transformer version while its smaller with minimal performance loss.



## Model Details
Different from embedding model `bge-large-en-v1.5`, reranker uses question and document as input and directly output similarity instead of embedding.
You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.
Besides this is highly optimized version using `c2translate` library suitable for production environments.

### Model Sources

The original model is based on `BAAI` `BGE-Reranker` model. Please visit [bge-reranker-orignal-repo](https://huggingface.co/BAAI/bge-reranker-large)
for more details.

## Usage

Simply `pip install ctranslate2` and then

```python
import ctranslate2
import transformers
import torch

device_mapping="cuda" if torch.cuda.is_available() else "cpu"

model_dir = "hooman650/ct2fast-bge-reranker"

# ctranslate2 encoder heavy lifting
encoder = ctranslate2.Encoder(model_dir, device = device_mapping)

# the classification head comes from HF
model_name = "BAAI/bge-reranker-large"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
classifier = transformers.AutoModelForSequenceClassification.from_pretrained(model_name).classifier

classifier.eval()
classifier.to(device_mapping)

pairs = [
    ["I like Ctranslate2","Ctranslate2 makes mid range models faster"],
    ["I like Ctranslate2","Using naive transformers might not be suitable for deployment"]
]
with torch.no_grad():
    tokens = tokenizer(pairs, padding=True, truncation=True, max_length=512).input_ids
    output = encoder.forward_batch(tokens)
    hidden_state = torch.as_tensor(output.last_hidden_state, device=device_mapping)
    logits = classifier(hidden_state).squeeze()

print(logits)

# tensor([ 1.0474, -9.4694], device='cuda:0')
```


#### Hardware

Supports both GPU and CPU.