ayoubkirouane
commited on
Commit
•
1744eda
1
Parent(s):
0041d7f
Update README.md
Browse files
README.md
CHANGED
@@ -6,54 +6,27 @@ tags:
|
|
6 |
model-index:
|
7 |
- name: distilroberta-base-finetuned-wikitext2
|
8 |
results: []
|
|
|
|
|
|
|
|
|
|
|
9 |
---
|
10 |
|
11 |
-
|
12 |
-
|
|
|
|
|
13 |
|
14 |
-
# distilroberta-base-finetuned-wikitext2
|
15 |
|
16 |
-
|
17 |
-
|
18 |
-
- Loss: 1.8538
|
19 |
|
20 |
-
##
|
|
|
21 |
|
22 |
-
|
23 |
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
## Training and evaluation data
|
29 |
-
|
30 |
-
More information needed
|
31 |
-
|
32 |
-
## Training procedure
|
33 |
-
|
34 |
-
### Training hyperparameters
|
35 |
-
|
36 |
-
The following hyperparameters were used during training:
|
37 |
-
- learning_rate: 2e-05
|
38 |
-
- train_batch_size: 16
|
39 |
-
- eval_batch_size: 16
|
40 |
-
- seed: 42
|
41 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
42 |
-
- lr_scheduler_type: linear
|
43 |
-
- num_epochs: 3.0
|
44 |
-
|
45 |
-
### Training results
|
46 |
-
|
47 |
-
| Training Loss | Epoch | Step | Validation Loss |
|
48 |
-
|:-------------:|:-----:|:----:|:---------------:|
|
49 |
-
| 2.101 | 1.0 | 1203 | 1.9475 |
|
50 |
-
| 2.0265 | 2.0 | 2406 | 1.8914 |
|
51 |
-
| 1.9672 | 3.0 | 3609 | 1.8538 |
|
52 |
-
|
53 |
-
|
54 |
-
### Framework versions
|
55 |
-
|
56 |
-
- Transformers 4.33.0
|
57 |
-
- Pytorch 2.0.0
|
58 |
-
- Datasets 2.1.0
|
59 |
-
- Tokenizers 0.13.3
|
|
|
6 |
model-index:
|
7 |
- name: distilroberta-base-finetuned-wikitext2
|
8 |
results: []
|
9 |
+
datasets:
|
10 |
+
- wikitext
|
11 |
+
language:
|
12 |
+
- en
|
13 |
+
pipeline_tag: fill-mask
|
14 |
---
|
15 |
|
16 |
+
## Overview
|
17 |
+
+ **Model Name**: FILL-MAsk-RoBERTa-base
|
18 |
+
+ **Task**: Masked Language Modeling (FILL-MAsk)
|
19 |
+
+ **Dataset**: WikiText2
|
20 |
|
|
|
21 |
|
22 |
+
## Model Description
|
23 |
+
**FILL-MAsk-RoBERTa-base** is a distilled version of the RoBERTa-base model, designed for the Masked Language Modeling task. This model follows a similar training procedure as DistilBERT, resulting in a smaller model with 6 layers, 768 dimensions, and 12 attention heads. It contains a total of 82 million parameters, making it more lightweight compared to the original RoBERTa-base, which has 125 million parameters. On average, DistilRoBERTa is approximately twice as fast as RoBERTa-base.
|
|
|
24 |
|
25 |
+
## Usage
|
26 |
+
**FILL-MAsk-RoBERTa-base** can be used for both direct and downstream tasks. It is suitable for masked language modeling tasks, where tokens are masked, and the model must predict the masked tokens. It is also intended to be fine-tuned on downstream tasks such as sequence classification, token classification, or question answering. Users can explore the Hugging Face Model Hub to find fine-tuned versions of this model for specific tasks of interest.
|
27 |
|
28 |
+
## Limitations
|
29 |
|
30 |
+
+ **Bias**: Significant research has explored bias and fairness issues with language models. Predictions generated by this model may contain biases, including harmful stereotypes related to protected classes, identity characteristics, and sensitive social and occupational groups.
|
31 |
+
+ **Fairness**: It's essential to be aware of fairness considerations when using this model and to ensure that its predictions do not contribute to unfair or harmful outcomes.
|
32 |
+
+ **Ethical Use**: Users are encouraged to use this model ethically and responsibly, taking into account the potential for bias and ensuring that it does not generate harmful or offensive content.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|