David
commited on
Commit
·
296e183
1
Parent(s):
b6d9472
Update README.md
Browse files
README.md
CHANGED
@@ -14,8 +14,8 @@ We release a `small` and `medium` version with the following configuration:
|
|
14 |
|
15 |
| Model | Layers | Embedding/Hidden Size | Params | Vocab Size | Max Sequence Length | Cased |
|
16 |
| --- | --- | --- | --- | --- | --- | --- |
|
17 |
-
| SELECTRA small | 12 | 256 | 22M | 50k | 512 | True |
|
18 |
-
| **SELECTRA medium** | **12** | **384** | **41M** | **50k** | **512** | **True** |
|
19 |
|
20 |
Selectra small (medium) is about 5 (3) times smaller than BETO but achieves comparable results (see Metrics section below).
|
21 |
|
@@ -27,8 +27,8 @@ The discriminator should therefore activate the logit corresponding to the fake
|
|
27 |
```python
|
28 |
from transformers import ElectraForPreTraining, ElectraTokenizerFast
|
29 |
|
30 |
-
discriminator = ElectraForPreTraining.from_pretrained("Recognai/
|
31 |
-
tokenizer = ElectraTokenizerFast.from_pretrained("Recognai/
|
32 |
|
33 |
sentence_with_fake_token = "Estamos desayunando pan rosa con tomate y aceite de oliva."
|
34 |
|
@@ -39,13 +39,15 @@ print("\t".join(tokenizer.tokenize(sentence_with_fake_token)))
|
|
39 |
print("\t".join(map(lambda x: str(x)[:4], logits[1:-1])))
|
40 |
"""Output:
|
41 |
Estamos desayun ##ando pan rosa con tomate y aceite de oliva .
|
42 |
-
-
|
43 |
"""
|
44 |
```
|
45 |
|
46 |
-
However, you probably want to use this model to fine-tune it on a
|
|
|
47 |
|
48 |
-
-
|
|
|
49 |
|
50 |
## Metrics
|
51 |
|
@@ -59,7 +61,7 @@ We fine-tune our models on 4 different down-stream tasks:
|
|
59 |
For each task, we conduct 5 trials and state the mean and standard deviation of the metrics in the table below.
|
60 |
To compare our results to other Spanish language models, we provide the same metrics taken from [Table 4](https://huggingface.co/bertin-project/bertin-roberta-base-spanish#results) of the Bertin-project model card.
|
61 |
|
62 |
-
| Model | CoNLL2002 - POS (acc) | CoNLL2002 - NER (f1) | PAWS-X (acc) | XNLI (acc) | Params |
|
63 |
| --- | --- | --- | --- | --- | --- |
|
64 |
| SELECTRA small | 0.9653 +- 0.0007 | 0.863 +- 0.004 | 0.896 +- 0.002 | 0.784 +- 0.002 | **22M** |
|
65 |
| SELECTRA medium | 0.9677 +- 0.0004 | 0.870 +- 0.003 | 0.896 +- 0.002 | **0.804 +- 0.002** | 41M |
|
|
|
14 |
|
15 |
| Model | Layers | Embedding/Hidden Size | Params | Vocab Size | Max Sequence Length | Cased |
|
16 |
| --- | --- | --- | --- | --- | --- | --- |
|
17 |
+
| [SELECTRA small](https://huggingface.co/Recognai/selectra_small) | 12 | 256 | 22M | 50k | 512 | True |
|
18 |
+
| **SELECTRA medium**] | **12** | **384** | **41M** | **50k** | **512** | **True** |
|
19 |
|
20 |
Selectra small (medium) is about 5 (3) times smaller than BETO but achieves comparable results (see Metrics section below).
|
21 |
|
|
|
27 |
```python
|
28 |
from transformers import ElectraForPreTraining, ElectraTokenizerFast
|
29 |
|
30 |
+
discriminator = ElectraForPreTraining.from_pretrained("Recognai/selectra_small")
|
31 |
+
tokenizer = ElectraTokenizerFast.from_pretrained("Recognai/selectra_small")
|
32 |
|
33 |
sentence_with_fake_token = "Estamos desayunando pan rosa con tomate y aceite de oliva."
|
34 |
|
|
|
39 |
print("\t".join(map(lambda x: str(x)[:4], logits[1:-1])))
|
40 |
"""Output:
|
41 |
Estamos desayun ##ando pan rosa con tomate y aceite de oliva .
|
42 |
+
-3.1 -3.6 -6.9 -3.0 0.19 -4.5 -3.3 -5.1 -5.7 -7.7 -4.4 -4.2
|
43 |
"""
|
44 |
```
|
45 |
|
46 |
+
However, you probably want to use this model to fine-tune it on a downstream task.
|
47 |
+
We provide models fine-tuned on the [XNLI dataset](https://huggingface.co/datasets/xnli), which can be used together with the zero-shot classification pipeline:
|
48 |
|
49 |
+
- [Zero-shot SELECTRA small](https://huggingface.co/Recognai/zeroshot_selectra_small)
|
50 |
+
- [Zero-shot SELECTRA medium](https://huggingface.co/Recognai/zeroshot_selectra_medium)
|
51 |
|
52 |
## Metrics
|
53 |
|
|
|
61 |
For each task, we conduct 5 trials and state the mean and standard deviation of the metrics in the table below.
|
62 |
To compare our results to other Spanish language models, we provide the same metrics taken from [Table 4](https://huggingface.co/bertin-project/bertin-roberta-base-spanish#results) of the Bertin-project model card.
|
63 |
|
64 |
+
| Model | [CoNLL2002](https://huggingface.co/datasets/conll2002) - POS (acc) | [CoNLL2002](https://huggingface.co/datasets/conll2002) - NER (f1) | [PAWS-X](https://huggingface.co/datasets/paws-x) (acc) | [XNLI](https://huggingface.co/datasets/xnli) (acc) | Params |
|
65 |
| --- | --- | --- | --- | --- | --- |
|
66 |
| SELECTRA small | 0.9653 +- 0.0007 | 0.863 +- 0.004 | 0.896 +- 0.002 | 0.784 +- 0.002 | **22M** |
|
67 |
| SELECTRA medium | 0.9677 +- 0.0004 | 0.870 +- 0.003 | 0.896 +- 0.002 | **0.804 +- 0.002** | 41M |
|