File size: 1,870 Bytes
1818d35
 
 
 
 
49736c4
 
2b4214c
49736c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9cc7a78
 
49736c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
language: cs
license: MIT
---

# Small-E-Czech

Small-E-Czech is an [Electra](https://arxiv.org/abs/2003.10555)-small model pretrained on a Czech web corpus created at Seznam.cz. Like other pretrained models, it should be finetuned on a downstream task of interest before use. At Seznam.cz, it has helped improve web search ranking, query typo correction or clickbait titles detection. We release it under MIT license (i.e. allowing commercial use).

### How to use the discriminator in transformers
```python
from transformers import ElectraForPreTraining, ElectraTokenizerFast
import torch

discriminator = ElectraForPreTraining.from_pretrained("seznam/small-e-czech")
tokenizer = ElectraTokenizerFast.from_pretrained(
    "seznam/small-e-czech", strip_accents=False
)

sentence = "Za hory, za doly, mé zlaté parohy"
fake_sentence = "Za hory, za doly, kočka zlaté parohy"

fake_sentence_tokens = ["[CLS]"] + tokenizer.tokenize(fake_sentence) + ["[SEP]"]
fake_inputs = tokenizer.encode(fake_sentence, return_tensors="pt")
outputs = discriminator(fake_inputs)
predictions = torch.nn.Sigmoid()(outputs[0]).cpu().detach().numpy()

for token in fake_sentence_tokens:
    print("{:>7s}".format(token), end="")
print()

for prediction in predictions.squeeze():
    print("{:7.1f}".format(prediction), end="")
print()
```

In the output we can see the probabilities of particular tokens not belonging in the sentence (i.e. having been faked by the generator) according to the discriminator:

```
  [CLS]     za   hory      ,     za    dol    ##y      ,  kočka  zlaté   paro   ##hy  [SEP]
    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.8    0.3    0.2    0.1    0.0
```

### Finetuning

For instructions on how to finetune the model on a new task, see the official HuggingFace transformers [tutorial](https://huggingface.co/transformers/training.html).