|
--- |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- feature-extraction |
|
- sentence-similarity |
|
--- |
|
# BGE-M3 in HuggingFace Transformer |
|
|
|
> **This is not an official implementation of BGE-M3. Official implementation can be found in [Flag Embedding](https://github.com/FlagOpen/FlagEmbedding) project.** |
|
|
|
## Introduction |
|
|
|
Full introduction please see the github repo. |
|
|
|
https://github.com/liuyanyi/transformers-bge-m3 |
|
|
|
## Use BGE-M3 in HuggingFace Transformer |
|
|
|
```python |
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
# Trust remote code is required to load the model |
|
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) |
|
model = AutoModel.from_pretrained(model_path, trust_remote_code=True) |
|
|
|
input_str = "Hello, world!" |
|
input_ids = tokenizer(input_str, return_tensors="pt", padding=True, truncation=True) |
|
|
|
output = model(**input_ids, return_dict=True) |
|
|
|
dense_output = output.dense_output # To align with Flag Embedding project, a normalization is required |
|
colbert_output = output.colbert_output # To align with Flag Embedding project, a normalization is required |
|
sparse_output = output.sparse_output |
|
``` |
|
|
|
## References |
|
|
|
- [Official BGE-M3 Weight](https://huggingface.co/BAAI/bge-m3) |
|
- [Flag Embedding](https://github.com/FlagOpen/FlagEmbedding) |
|
- [HuggingFace Transformer](https://github.com/huggingface/transformers) |