Update README.md
Browse files
README.md
CHANGED
@@ -52,17 +52,17 @@ pipeline_tag: question-answering
|
|
52 |
|
53 |
This is a <b>DeBERTa</b> <b>[1]</b> model for the <b>Italian</b> language, fine-tuned for <b>Extractive Question Answering</b> on the [SQuAD-IT](https://huggingface.co/datasets/squad_it) dataset <b>[2]</b>.
|
54 |
|
|
|
|
|
|
|
|
|
|
|
55 |
<b>update: version 2.0</b>
|
56 |
|
57 |
The 2.0 version further improves the performances by exploiting a 2-phases fine-tuning strategy: the model is first fine-tuned on the English SQuAD v2 (1 epoch, 20% warmup ratio, and max learning rate of 3e-5) then further fine-tuned on the Italian SQuAD (2 epochs, no warmup, initial learning rate of 3e-5)
|
58 |
|
59 |
In order to maximize the benefits of the multilingual procedure, [mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) is used as a pre-trained model. When the double fine-tuning is completed, the embedding layer is then compressed as in [deberta-base-italian](https://huggingface.co/osiria/deberta-base-italian) to obtain a mono-lingual model size
|
60 |
|
61 |
-
<h3>Training and Performances</h3>
|
62 |
-
|
63 |
-
The model is trained to perform question answering, given a context and a question (under the assumption that the context contains the answer to the question). It has been fine-tuned for Extractive Question Answering, using the SQuAD-IT dataset, for 2 epochs with a linearly decaying learning rate starting from 3e-5, maximum sequence length of 384 and document stride of 128.
|
64 |
-
<br>The dataset includes 54.159 training instances and 7.609 test instances
|
65 |
-
|
66 |
The performances on the test set are reported in the following table:
|
67 |
|
68 |
(<b>version 2.0</b> performances)
|
|
|
52 |
|
53 |
This is a <b>DeBERTa</b> <b>[1]</b> model for the <b>Italian</b> language, fine-tuned for <b>Extractive Question Answering</b> on the [SQuAD-IT](https://huggingface.co/datasets/squad_it) dataset <b>[2]</b>.
|
54 |
|
55 |
+
<h3>Training and Performances</h3>
|
56 |
+
|
57 |
+
The model is trained to perform question answering, given a context and a question (under the assumption that the context contains the answer to the question). It has been fine-tuned for Extractive Question Answering, using the SQuAD-IT dataset, for 2 epochs with a linearly decaying learning rate starting from 3e-5, maximum sequence length of 384 and document stride of 128.
|
58 |
+
<br>The dataset includes 54.159 training instances and 7.609 test instances
|
59 |
+
|
60 |
<b>update: version 2.0</b>
|
61 |
|
62 |
The 2.0 version further improves the performances by exploiting a 2-phases fine-tuning strategy: the model is first fine-tuned on the English SQuAD v2 (1 epoch, 20% warmup ratio, and max learning rate of 3e-5) then further fine-tuned on the Italian SQuAD (2 epochs, no warmup, initial learning rate of 3e-5)
|
63 |
|
64 |
In order to maximize the benefits of the multilingual procedure, [mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) is used as a pre-trained model. When the double fine-tuning is completed, the embedding layer is then compressed as in [deberta-base-italian](https://huggingface.co/osiria/deberta-base-italian) to obtain a mono-lingual model size
|
65 |
|
|
|
|
|
|
|
|
|
|
|
66 |
The performances on the test set are reported in the following table:
|
67 |
|
68 |
(<b>version 2.0</b> performances)
|