Merge branch 'main' of https://huggingface.co/yazge/turkish-colbert-onnx
Browse files
README.md
CHANGED
@@ -1,23 +1,28 @@
|
|
1 |
---
|
2 |
tags:
|
3 |
-
- Turkish
|
4 |
-
- turkish
|
5 |
-
- passage-retrieval
|
6 |
license: mit
|
7 |
language:
|
8 |
-
- tr
|
9 |
base_model: ytu-ce-cosmos/turkish-base-bert-uncased
|
10 |
---
|
|
|
11 |
# Turkish-ColBERT
|
|
|
12 |
This is a Turkish passage retrieval model based on the [ColBERT](https://doi.org/10.48550/arXiv.2112.01488) architecture.
|
13 |
|
14 |
The [Cosmos Turkish Base BERT](https://huggingface.co/ytu-ce-cosmos/turkish-base-bert-uncased) model was fine-tuned on 500k triplets (query, positive passage, negative passage) from a Turkish-translated version of the [MS MARCO dataset](https://huggingface.co/datasets/parsak/msmarco-tr).
|
15 |
|
16 |
#### ⚠ Uncased use requires manual lowercase conversion
|
17 |
-
|
|
|
|
|
18 |
```python
|
19 |
text.replace("I", "ı").lower()
|
20 |
```
|
|
|
21 |
This is due to a [known issue](https://github.com/huggingface/transformers/issues/6680) with the tokenizer.
|
22 |
|
23 |
## Example Usage
|
@@ -51,16 +56,19 @@ print(results[0]['content']) # "marie curie, radyoaktivite üzerine yaptığı
|
|
51 |
```
|
52 |
|
53 |
# Evaluation
|
54 |
-
|
55 |
-
|
56 |
-
|
|
57 |
-
| [
|
|
|
58 |
|
59 |
# Acknowledgments
|
|
|
60 |
- Research supported with Cloud TPUs from [Google's TensorFlow Research Cloud](https://sites.research.google/trc/about/) (TFRC). Thanks for providing access to the TFRC ❤️
|
61 |
- Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗
|
62 |
|
63 |
# Citations
|
|
|
64 |
```bibtex
|
65 |
@article{kesgin2023developing,
|
66 |
title={Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models},
|
@@ -70,7 +78,9 @@ print(results[0]['content']) # "marie curie, radyoaktivite üzerine yaptığı
|
|
70 |
}
|
71 |
```
|
72 |
|
73 |
-
### Contact
|
74 |
-
|
|
|
75 |
https://cosmos.yildiz.edu.tr/ <br>
|
76 |
-
[email protected] <br>
|
|
|
|
1 |
---
|
2 |
tags:
|
3 |
+
- Turkish
|
4 |
+
- turkish
|
5 |
+
- passage-retrieval
|
6 |
license: mit
|
7 |
language:
|
8 |
+
- tr
|
9 |
base_model: ytu-ce-cosmos/turkish-base-bert-uncased
|
10 |
---
|
11 |
+
|
12 |
# Turkish-ColBERT
|
13 |
+
|
14 |
This is a Turkish passage retrieval model based on the [ColBERT](https://doi.org/10.48550/arXiv.2112.01488) architecture.
|
15 |
|
16 |
The [Cosmos Turkish Base BERT](https://huggingface.co/ytu-ce-cosmos/turkish-base-bert-uncased) model was fine-tuned on 500k triplets (query, positive passage, negative passage) from a Turkish-translated version of the [MS MARCO dataset](https://huggingface.co/datasets/parsak/msmarco-tr).
|
17 |
|
18 |
#### ⚠ Uncased use requires manual lowercase conversion
|
19 |
+
|
20 |
+
Convert your text to lower case as follows:
|
21 |
+
|
22 |
```python
|
23 |
text.replace("I", "ı").lower()
|
24 |
```
|
25 |
+
|
26 |
This is due to a [known issue](https://github.com/huggingface/transformers/issues/6680) with the tokenizer.
|
27 |
|
28 |
## Example Usage
|
|
|
56 |
```
|
57 |
|
58 |
# Evaluation
|
59 |
+
|
60 |
+
| Dataset | R@1 | R@5 | R@10 | MRR@10 |
|
61 |
+
| ------------------------------------------------------------------------ | ----- | ----- | ----- | ------ |
|
62 |
+
| [Scifact-tr](https://huggingface.co/datasets/AbdulkaderSaoud/scifact-tr) | 48.38 | 67.85 | 75.52 | 56.88 |
|
63 |
+
| [WikiRAG-TR](https://huggingface.co/datasets/Metin/WikiRAG-TR) | 31.21 | 75.63 | 79.63 | 49.08 |
|
64 |
|
65 |
# Acknowledgments
|
66 |
+
|
67 |
- Research supported with Cloud TPUs from [Google's TensorFlow Research Cloud](https://sites.research.google/trc/about/) (TFRC). Thanks for providing access to the TFRC ❤️
|
68 |
- Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗
|
69 |
|
70 |
# Citations
|
71 |
+
|
72 |
```bibtex
|
73 |
@article{kesgin2023developing,
|
74 |
title={Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models},
|
|
|
78 |
}
|
79 |
```
|
80 |
|
81 |
+
### Contact
|
82 |
+
|
83 |
+
COSMOS AI Research Group, Yildiz Technical University Computer Engineering Department <br>
|
84 |
https://cosmos.yildiz.edu.tr/ <br>
|
85 |
+
[email protected] <br>
|
86 |
+
|