amu-cai
/

slavlemma-large

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

slavlemma-large / README.md

anowakowski's picture

Upload slavlemma-large

cf824dc almost 2 years ago

|

1.87 kB

	---
	language:
	- pl
	- cs
	- ru
	tags:
	- mT5
	- lemmatization
	license: apache-2.0
	---


	# SlavLemma Large

	SlavLemma models are intended for lemmatization of named entities and multi-word expressions in Polish, Czech and Russian languages.

	They were fine-tuned from the google/mT5 models, e.g.: [google/mt5-large](https://huggingface.co/google/mt5-large).

	## Usage

	When using the model, prepend one of the language tokens (`>>pl<<`, `>>cs<<`, `>>ru<<`) to the input, based on the language of the phrase you want to lemmatize.

	Sample usage:

	```
	from transformers import pipeline

	pipe = pipeline(task="text2text-generation", model="amu-cai/slavlemma-large", tokenizer="amu-cai/slavlemma-large")
	hyp = [res['generated_text'] for res in pipe([">>pl<< federalnego urzędu statystycznego"], clean_up_tokenization_spaces=True, num_beams=5)][0]
	```


	## Evaluation results

	Lemmatization Exact Match was computed on the SlavNER 2021 test sets (COVID-19 and USA 2020 Elections).


	COVID-19:
	\| Model \| pl \| cs \| ru \|
	\| :------ \| ------: \| ------: \| ------: \|
	\| [slavlemma-large](https://huggingface.co/amu-cai/slavlemma-large) \| 93.76 \| 89.80 \| 77.30
	\| [slavlemma-base](https://huggingface.co/amu-cai/slavlemma-base) \| 91.00 \|86.29\| 76.10
	\| [slavlemma-small](https://huggingface.co/amu-cai/slavlemma-small)\| 86.80 \|80.98\| 73.83

	USA 2020 Elections:
	\| Model \| pl \| cs \| ru \|
	\| :------ \| ------: \| ------: \| ------: \|
	\| [slavlemma-large](https://huggingface.co/amu-cai/slavlemma-large) \| 89.12 \| 87.27\| 82.50
	\| [slavlemma-base](https://huggingface.co/amu-cai/slavlemma-base) \| 84.19 \|81.97\| 80.27
	\| [slavlemma-small](https://huggingface.co/amu-cai/slavlemma-small)\| 78.85 \|75.86\| 76.18


	## Citation

	If you use the model, please cite the following paper:

	TBD

	### Framework versions

	- Transformers 4.26.0
	- Pytorch 1.13.1.post200
	- Datasets 2.9.0
	- Tokenizers 0.13.2