|
--- |
|
language: pl |
|
tags: |
|
- T5 |
|
- lemmatization |
|
license: apache-2.0 |
|
--- |
|
|
|
|
|
# PoLemma Large |
|
|
|
PoLemma models are intended for lemmatization of named entities and multi-word expressions in the Polish language. |
|
|
|
They were fine-tuned from the allegro/plT5 models, e.g.: [allegro/plt5-large](https://huggingface.co/allegro/plt5-large). |
|
|
|
## Usage |
|
|
|
Sample usage: |
|
|
|
``` |
|
from transformers import pipeline |
|
|
|
pipe = pipeline(task="text2text-generation", model="amu-cai/polemma-base", tokenizer="amu-cai/polemma-base") |
|
hyp = [res['generated_text'] for res in pipe(["federalnego urzędu statystycznego"], clean_up_tokenization_spaces=True, num_beams=5)][0] |
|
``` |
|
|
|
|
|
## Evaluation results |
|
|
|
Lemmatization Exact Match was computed on the SlavNER 2021 test set. |
|
|
|
| Model | Exact Match || |
|
| :------ | ------: | ------: | |
|
| [polemma-large]() | 92.61 | |
|
| [polemma-base]() | 91.34 | |
|
| [polemma-small]()| 88.46 | |
|
|
|
|
|
## Citation |
|
|
|
If you use the model, please cite the following paper: |
|
|
|
TBD |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.26.0 |
|
- Pytorch 1.13.1.post200 |
|
- Datasets 2.9.0 |
|
- Tokenizers 0.13.2 |
|
|