Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ Utilizing the re-ranking model (e.g., [bge-reranker](https://github.com/FlagOpen
|
|
27 |
|
28 |
|
29 |
## News:
|
30 |
-
- 2/6/2024: We release the [MLDR](https://huggingface.co/datasets/Shitao/MLDR)
|
31 |
- 2/1/2024: **Thanks for the excellent tool from Vespa.** You can easily use multiple modes of BGE-M3 following this [notebook](https://github.com/vespa-engine/pyvespa/blob/master/docs/sphinx/source/examples/mother-of-all-embedding-models-cloud.ipynb)
|
32 |
|
33 |
|
@@ -243,22 +243,29 @@ The small-batch strategy is simple but effective, which also can used to fine-tu
|
|
243 |
- MCLS: A simple method to improve the performance on long text without fine-tuning.
|
244 |
If you have no enough resource to fine-tuning model with long text, the method is useful.
|
245 |
|
246 |
-
Refer to our [report](https://
|
247 |
|
248 |
**The fine-tuning codes and datasets will be open-sourced in the near future.**
|
249 |
|
250 |
|
251 |
-
|
252 |
## Acknowledgement
|
253 |
|
254 |
-
Thanks
|
255 |
-
Thanks
|
|
|
256 |
|
257 |
|
258 |
## Citation
|
259 |
|
260 |
-
If you find this repository useful, please consider giving a star :star: and
|
261 |
|
262 |
```
|
263 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
264 |
```
|
|
|
27 |
|
28 |
|
29 |
## News:
|
30 |
+
- 2/6/2024: We release the [MLDR](https://huggingface.co/datasets/Shitao/MLDR) (a long document retrieval dataset covering 13 languages) and [evaluation pipeline](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR).
|
31 |
- 2/1/2024: **Thanks for the excellent tool from Vespa.** You can easily use multiple modes of BGE-M3 following this [notebook](https://github.com/vespa-engine/pyvespa/blob/master/docs/sphinx/source/examples/mother-of-all-embedding-models-cloud.ipynb)
|
32 |
|
33 |
|
|
|
243 |
- MCLS: A simple method to improve the performance on long text without fine-tuning.
|
244 |
If you have no enough resource to fine-tuning model with long text, the method is useful.
|
245 |
|
246 |
+
Refer to our [report](https://arxiv.org/pdf/2402.03216.pdf) for more details.
|
247 |
|
248 |
**The fine-tuning codes and datasets will be open-sourced in the near future.**
|
249 |
|
250 |
|
|
|
251 |
## Acknowledgement
|
252 |
|
253 |
+
Thanks the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc.
|
254 |
+
Thanks the open-sourced libraries like [Tevatron](https://github.com/texttron/tevatron), [pyserial](https://github.com/pyserial/pyserial).
|
255 |
+
|
256 |
|
257 |
|
258 |
## Citation
|
259 |
|
260 |
+
If you find this repository useful, please consider giving a star :star: and citation
|
261 |
|
262 |
```
|
263 |
+
@misc{bge-m3,
|
264 |
+
title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation},
|
265 |
+
author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
|
266 |
+
year={2024},
|
267 |
+
eprint={2402.03216},
|
268 |
+
archivePrefix={arXiv},
|
269 |
+
primaryClass={cs.CL}
|
270 |
+
}
|
271 |
```
|