k-ush commited on
Commit
33c0736
·
1 Parent(s): a1ddbe5

Update README.md for the note of dataset preparation

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -20,6 +20,11 @@ A XLM-RoBERTa-base model trained on [mMARCO](https://github.com/unicamp-dl/mMARC
20
  Base checkpoint comes from [k-ush/xlm-roberta-base-ance-warmup](https://huggingface.co/k-ush/xlm-roberta-base-ance-warmup), so this model was trained both English and Japanese data.
21
  I upload checkpoint at 50k steps since MRR@100 at 60k checkpoint was decrease (mrr@100(rerank, full): 0.242, 0.182).
22
 
 
 
 
 
 
23
  # Evaluation Result
24
  Evaluation Result during trainning with mMarco Japanese dev set.
25
  ``` text
 
20
  Base checkpoint comes from [k-ush/xlm-roberta-base-ance-warmup](https://huggingface.co/k-ush/xlm-roberta-base-ance-warmup), so this model was trained both English and Japanese data.
21
  I upload checkpoint at 50k steps since MRR@100 at 60k checkpoint was decrease (mrr@100(rerank, full): 0.242, 0.182).
22
 
23
+ # Dataset
24
+ I formmated Japanese mMarco dataset for ANCE.
25
+ Dataset preparetion script is available on github.
26
+ https://github.com/argonism/JANCE/blob/master/data/gen_jp_data.py
27
+
28
  # Evaluation Result
29
  Evaluation Result during trainning with mMarco Japanese dev set.
30
  ``` text