File size: 1,993 Bytes
23a065c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
---
language: is
datasets:
- Icelandic portion of the OSCAR corpus from INRIA
- oscar
---
# IsRoBERTa a RoBERTa-like masked language model
Probably the first icelandic transformer language model!
## Overview
**Language:** Icelandic
**Downstream-task:** masked-lm
**Training data:** OSCAR corpus
**Code:** See [here](https://github.com/neurocode-io/icelandic-language-model)
**Infrastructure**: 1x Nvidia K80
## Hyperparameters
```
per_device_train_batch_size = 48
n_epochs = 1
vocab_size = 52.000
max_position_embeddings = 514
num_attention_heads = 12
num_hidden_layers = 6
type_vocab_size = 1
learning_rate=0.00005
```
## Usage
### In Transformers
```python
from transformers import (
pipeline,
AutoTokenizer,
AutoModelWithLMHead
)
model_name = "neurocode/IsRoBERTa"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelWithLMHead.from_pretrained(model_name)
>>> fill_mask = pipeline(
... "fill-mask",
... model=model,
... tokenizer=tokenizer
... )
>>> result = fill_mask("Hann fór út að <mask>.")
>>> result
[
{'sequence': '<s>Hann fór út að nýju.</s>', 'score': 0.03395755589008331, 'token': 2219, 'token_str': 'Ġnýju'},
{'sequence': '<s>Hann fór út að undanförnu.</s>', 'score': 0.029087543487548828, 'token': 7590, 'token_str': 'Ġundanförnu'},
{'sequence': '<s>Hann fór út að lokum.</s>', 'score': 0.024420788511633873, 'token': 4384, 'token_str': 'Ġlokum'},
{'sequence': '<s>Hann fór út að þessu.</s>', 'score': 0.021231256425380707, 'token': 921, 'token_str': 'Ġþessu'},
{'sequence': '<s>Hann fór út að honum.</s>', 'score': 0.0205782949924469, 'token': 1136, 'token_str': 'Ġhonum'}
]
```
## Authors
Bobby Donchev: `contact [at] donchev.is`
Elena Cramer: `elena.cramer [at] neurocode.io`
## About us
We bring AI software for our customers live
Our focus: AI software development
Get in touch:
[LinkedIn](https://de.linkedin.com/company/neurocodeio) | [Website](https://neurocode.io)
|