ccasimiro commited on
Commit
8f9c24f
1 Parent(s): cc74854

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -14,9 +14,9 @@ widget:
14
  ---
15
 
16
  # Biomedical language model for Spanish
 
17
 
18
  ## BibTeX citation
19
-
20
  If you use any of these resources (datasets or models) in your work, please cite our latest paper:
21
 
22
  ```bibtex
@@ -30,9 +30,13 @@ If you use any of these resources (datasets or models) in your work, please cite
30
  }
31
  ```
32
 
33
- ## Model and tokenization
 
34
  This model is a [RoBERTa-based](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model trained on a
35
- **biomedical** corpus collected from several sources (see next section).
 
 
 
36
 
37
  ## Training corpora and preprocessing
38
 
 
14
  ---
15
 
16
  # Biomedical language model for Spanish
17
+ Biomedical pretrained language model for Spanish. For more details about the corpus, the pretraining and the evaluation, read the paper below.
18
 
19
  ## BibTeX citation
 
20
  If you use any of these resources (datasets or models) in your work, please cite our latest paper:
21
 
22
  ```bibtex
 
30
  }
31
  ```
32
 
33
+ ## Tokenization and model pretraining
34
+
35
  This model is a [RoBERTa-based](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model trained on a
36
+ **biomedical** corpus in Spanish collected from several sources (see next section).
37
+ The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2)
38
+ used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 52,000 tokens. The pretraining consists of a masked language model training at the subword level following the approach employed for the RoBERTa base model with the same hyperparameters as in the original work. The training lasted a total of 48 hours with 16 NVIDIA V100 GPUs of 16GB DDRAM, using Adam optimizer with a peak learning rate of 0.0005 and an effective batch size of 2,048 sentences.
39
+
40
 
41
  ## Training corpora and preprocessing
42