Update README.md for the note of dataset preparation
Browse files
README.md
CHANGED
@@ -20,6 +20,11 @@ A XLM-RoBERTa-base model trained on [mMARCO](https://github.com/unicamp-dl/mMARC
|
|
20 |
Base checkpoint comes from [k-ush/xlm-roberta-base-ance-warmup](https://huggingface.co/k-ush/xlm-roberta-base-ance-warmup), so this model was trained both English and Japanese data.
|
21 |
I upload checkpoint at 50k steps since MRR@100 at 60k checkpoint was decrease (mrr@100(rerank, full): 0.242, 0.182).
|
22 |
|
|
|
|
|
|
|
|
|
|
|
23 |
# Evaluation Result
|
24 |
Evaluation Result during trainning with mMarco Japanese dev set.
|
25 |
``` text
|
|
|
20 |
Base checkpoint comes from [k-ush/xlm-roberta-base-ance-warmup](https://huggingface.co/k-ush/xlm-roberta-base-ance-warmup), so this model was trained both English and Japanese data.
|
21 |
I upload checkpoint at 50k steps since MRR@100 at 60k checkpoint was decrease (mrr@100(rerank, full): 0.242, 0.182).
|
22 |
|
23 |
+
# Dataset
|
24 |
+
I formmated Japanese mMarco dataset for ANCE.
|
25 |
+
Dataset preparetion script is available on github.
|
26 |
+
https://github.com/argonism/JANCE/blob/master/data/gen_jp_data.py
|
27 |
+
|
28 |
# Evaluation Result
|
29 |
Evaluation Result during trainning with mMarco Japanese dev set.
|
30 |
``` text
|