Severine's picture
Update README.md
b7925d4
|
raw
history blame
1.55 kB
---
language: sv
---
# A Swedish Bert model
## Model description
This model follows the Bert Large model architecture as implemented in [Megatron-LM framework](https://github.com/NVIDIA/Megatron-LM). It was trained with a batch size of 512 in 600k steps. The model contains following parameters:
<figure>
| Hyperparameter | Value |
|----------------------|------------|
| \\(n_{parameters}\\) | 340M |
| \\(n_{layers}\\) | 24 |
| \\(n_{heads}\\) | 16 |
| \\(n_{ctx}\\) | 1024 |
| \\(n_{vocab}\\) | 30592 |
## Training data
The model is pretrained on a Swedish text corpus of around 85 GB from a variety of sources as shown below.
<figure>
| Dataset | Genre | Size(GB)|
|----------------------|------|------|
| Anföranden | Politics |0.9|
|DCEP|Politics|0.6|
|DGT|Politics|0.7|
|Fass|Medical|0.6|
|Författningar|Legal|0.1|
|Web data|Misc|45.0|
|JRC|Legal|0.4|
|Litteraturbanken|Books|0.3O|
|SCAR|Misc|28.0|
|SOU|Politics|5.3|
|Subtitles|Drama|1.3|
|Wikipedia|Facts|1.8|
## Intended uses & limitations
The raw model can be used for the usual tasks of masked language modeling or next sentence prediction. It is also often fine-tuned on a downstream task to improve its performance in a specific domain/task.
<br>
<br>
## How to use
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("AI-Nordics/bert-large-swedish-cased")
model = AutoModelForMaskedLM.from_pretrained("AI-Nordics/bert-large-swedish-cased")