thegoodfellas
/

tgf-xlm-roberta-base-pt-br

@@ -9,40 +9,35 @@ widget:
     example_title: "Exemplo 3"
   - text: "Mitos e verdades sobre o <mask>. Doença que mais mata mulheres no Brasil."
     example_title: "Exemplo 4"
-tags:
-- generated_from_trainer
 model-index:
 - name: tgf-xlm-roberta-base-pt-br
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # tgf-xlm-roberta-base-pt-br
-This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on the None dataset.
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0001
-- train_batch_size: 64
-- eval_batch_size: 32
 - seed: 42
 - gradient_accumulation_steps: 8
 - total_train_batch_size: 512
@@ -52,13 +47,14 @@ The following hyperparameters were used during training:
 - num_epochs: 2
 - mixed_precision_training: Native AMP
-### Training results
 ### Framework versions
 - Transformers 4.23.1
 - Pytorch 1.11.0a0+b6df043
 - Datasets 2.6.1
 - Tokenizers 0.13.1

     example_title: "Exemplo 3"
   - text: "Mitos e verdades sobre o <mask>. Doença que mais mata mulheres no Brasil."
     example_title: "Exemplo 4"
 model-index:
 - name: tgf-xlm-roberta-base-pt-br
   results: []
 ---
 # tgf-xlm-roberta-base-pt-br
+This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on the [BrWac](https://huggingface.co/datasets/thegoodfellas/brwac_tiny) dataset.
 ## Model description
+This is a fine-tuned version of the Brazilian Portuguese language. It was trained using the [BrWac](https://huggingface.co/datasets/thegoodfellas/brwac_tiny) dataset and followed the principles from [Roberta's paper](https://arxiv.org/abs/1907.11692). The key strategies are:
+1. *Full-Sentences*: Quoted from the paper: "Each input is packed with full sentences sampled contiguously from one or more documents, such that the total length is at most 512 tokens. Inputs may cross document boundaries. When we reach the end of one document, we begin sampling sentences from the next document and add an extra separator token between documents".
+2. Tunned hyperparameters: adam_beta1=0.9, adam_beta2=0.98, adam_epsilon=1e-6 (as paper suggests)
+## Availability
+The source code is available [here](https://github.com/the-good-fellas/xlm-roberta-pt-br)
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 1e-4
+- train_batch_size: 16
 - seed: 42
 - gradient_accumulation_steps: 8
 - total_train_batch_size: 512
 - num_epochs: 2
 - mixed_precision_training: Native AMP
 ### Framework versions
 - Transformers 4.23.1
 - Pytorch 1.11.0a0+b6df043
 - Datasets 2.6.1
 - Tokenizers 0.13.1
+### Environment
+Special thanks to [DataCrunch.io](https://datacrunch.io) with their amazing, and affordable GPUs.
+<img src="https://datacrunch.io/_next/static/media/Logo.6b773500.svg"  width="20%"/>