Babelscape
/

wikineural-multilingual-ner

Token Classification

named-entity-recognition

sequence-tagger-model

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

wikineural-multilingual-ner / README.md

Simone Tedeschi

Update README.md

180823e about 3 years ago

|

2.62 kB

	---
	annotations_creators:
	- machine-generated
	language_creators:
	- machine-generated
	languages:
	- de
	- en
	- es
	- fr
	- it
	- nl
	- pl
	- pt
	- ru
	licenses:
	- cc-by-nc-sa-4.0
	pretty_name: wikineural-dataset
	source_datasets:
	- original
	task_categories:
	- structure-prediction
	task_ids:
	- named-entity-recognition

	---

	## Model Description

	- Summary: mBERT model fine-tuned for 3 epochs on the recently-introduced WikiNEuRal dataset for Multilingual NER. The system supports the 9 languages covered by WikiNEuRal, and it was trained on all 9 languages jointly.
	- Official Repository: [https://github.com/Babelscape/wikineural](https://github.com/Babelscape/wikineural)
	- Paper: [https://aclanthology.org/wikineural](https://aclanthology.org/2021.findings-emnlp.215/)

	## Licensing Information

	Contents of this repository are restricted to only non-commercial research purposes under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright of the dataset contents and models belongs to the original copyright holders.

	## Citation Information

	```bibtex
	@inproceedings{tedeschi-etal-2021-wikineural-combined,
	title = "{W}iki{NE}u{R}al: {C}ombined Neural and Knowledge-based Silver Data Creation for Multilingual {NER}",
	author = "Tedeschi, Simone and
	Maiorca, Valentino and
	Campolungo, Niccol{\`o} and
	Cecconi, Francesco and
	Navigli, Roberto",
	booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
	month = nov,
	year = "2021",
	address = "Punta Cana, Dominican Republic",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2021.findings-emnlp.215",
	pages = "2521--2533",
	abstract = "Multilingual Named Entity Recognition (NER) is a key intermediate task which is needed in many areas of NLP. In this paper, we address the well-known issue of data scarcity in NER, especially relevant when moving to a multilingual scenario, and go beyond current approaches to the creation of multilingual silver data for the task. We exploit the texts of Wikipedia and introduce a new methodology based on the effective combination of knowledge-based approaches and neural models, together with a novel domain adaptation technique, to produce high-quality training corpora for NER. We evaluate our datasets extensively on standard benchmarks for NER, yielding substantial improvements up to 6 span-based F1-score points over previous state-of-the-art systems for data creation.",
	}
	```