mehmet.erdogan commited on
Commit
f198482
·
2 Parent(s): aeb7e50 629b34c

Merge branch 'main' of https://huggingface.co/yazge/turkish-colbert-onnx

Browse files
Files changed (1) hide show
  1. README.md +22 -12
README.md CHANGED
@@ -1,23 +1,28 @@
1
  ---
2
  tags:
3
- - Turkish
4
- - turkish
5
- - passage-retrieval
6
  license: mit
7
  language:
8
- - tr
9
  base_model: ytu-ce-cosmos/turkish-base-bert-uncased
10
  ---
 
11
  # Turkish-ColBERT
 
12
  This is a Turkish passage retrieval model based on the [ColBERT](https://doi.org/10.48550/arXiv.2112.01488) architecture.
13
 
14
  The [Cosmos Turkish Base BERT](https://huggingface.co/ytu-ce-cosmos/turkish-base-bert-uncased) model was fine-tuned on 500k triplets (query, positive passage, negative passage) from a Turkish-translated version of the [MS MARCO dataset](https://huggingface.co/datasets/parsak/msmarco-tr).
15
 
16
  #### ⚠ Uncased use requires manual lowercase conversion
17
- Convert your text to lower case as follows:
 
 
18
  ```python
19
  text.replace("I", "ı").lower()
20
  ```
 
21
  This is due to a [known issue](https://github.com/huggingface/transformers/issues/6680) with the tokenizer.
22
 
23
  ## Example Usage
@@ -51,16 +56,19 @@ print(results[0]['content']) # "marie curie, radyoaktivite üzerine yaptığı
51
  ```
52
 
53
  # Evaluation
54
- | Dataset | R@1 | R@5 | R@10 | MRR@10 |
55
- |-------------|--------------|--------------|--------------|---------------|
56
- | [Scifact-tr](https://huggingface.co/datasets/AbdulkaderSaoud/scifact-tr) | 48.38 | 67.85 | 75.52 | 56.88 |
57
- | [WikiRAG-TR](https://huggingface.co/datasets/Metin/WikiRAG-TR) | 31.21 | 75.63 | 79.63 | 49.08 |
 
58
 
59
  # Acknowledgments
 
60
  - Research supported with Cloud TPUs from [Google's TensorFlow Research Cloud](https://sites.research.google/trc/about/) (TFRC). Thanks for providing access to the TFRC ❤️
61
  - Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗
62
 
63
  # Citations
 
64
  ```bibtex
65
  @article{kesgin2023developing,
66
  title={Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models},
@@ -70,7 +78,9 @@ print(results[0]['content']) # "marie curie, radyoaktivite üzerine yaptığı
70
  }
71
  ```
72
 
73
- ### Contact
74
- COSMOS AI Research Group, Yildiz Technical University Computer Engineering Department <br>
 
75
  https://cosmos.yildiz.edu.tr/ <br>
76
 
 
1
  ---
2
  tags:
3
+ - Turkish
4
+ - turkish
5
+ - passage-retrieval
6
  license: mit
7
  language:
8
+ - tr
9
  base_model: ytu-ce-cosmos/turkish-base-bert-uncased
10
  ---
11
+
12
  # Turkish-ColBERT
13
+
14
  This is a Turkish passage retrieval model based on the [ColBERT](https://doi.org/10.48550/arXiv.2112.01488) architecture.
15
 
16
  The [Cosmos Turkish Base BERT](https://huggingface.co/ytu-ce-cosmos/turkish-base-bert-uncased) model was fine-tuned on 500k triplets (query, positive passage, negative passage) from a Turkish-translated version of the [MS MARCO dataset](https://huggingface.co/datasets/parsak/msmarco-tr).
17
 
18
  #### ⚠ Uncased use requires manual lowercase conversion
19
+
20
+ Convert your text to lower case as follows:
21
+
22
  ```python
23
  text.replace("I", "ı").lower()
24
  ```
25
+
26
  This is due to a [known issue](https://github.com/huggingface/transformers/issues/6680) with the tokenizer.
27
 
28
  ## Example Usage
 
56
  ```
57
 
58
  # Evaluation
59
+
60
+ | Dataset | R@1 | R@5 | R@10 | MRR@10 |
61
+ | ------------------------------------------------------------------------ | ----- | ----- | ----- | ------ |
62
+ | [Scifact-tr](https://huggingface.co/datasets/AbdulkaderSaoud/scifact-tr) | 48.38 | 67.85 | 75.52 | 56.88 |
63
+ | [WikiRAG-TR](https://huggingface.co/datasets/Metin/WikiRAG-TR) | 31.21 | 75.63 | 79.63 | 49.08 |
64
 
65
  # Acknowledgments
66
+
67
  - Research supported with Cloud TPUs from [Google's TensorFlow Research Cloud](https://sites.research.google/trc/about/) (TFRC). Thanks for providing access to the TFRC ❤️
68
  - Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗
69
 
70
  # Citations
71
+
72
  ```bibtex
73
  @article{kesgin2023developing,
74
  title={Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models},
 
78
  }
79
  ```
80
 
81
+ ### Contact
82
+
83
+ COSMOS AI Research Group, Yildiz Technical University Computer Engineering Department <br>
84
  https://cosmos.yildiz.edu.tr/ <br>
85
86
+