Amharic BERT and RoBERTa
Collection
BERT and RoBERTa transformer encoder models pretrained on 290 million tokens of Amharic text
•
8 items
•
Updated
•
3
This is a fine-tuned version of the bert-medium-amharic model on the amharic-named-entity-recognition dataset and is ready to use for named entity recognition (NER).
It achieves the following results on the evaluation set:
Precision:
0.65Recall:
0.73F1:
0.69You can use this model directly with a pipeline for token classification:
from transformers import pipeline
checkpoint = "rasyosef/bert-medium-amharic-finetuned-ner"
token_classifier = pipeline("token-classification", model=checkpoint, aggregation_strategy="simple")
token_classifier("አትሌት ኃይሌ ገ/ሥላሴ ኒውዮርክ ውስጥ በሚደረገው የተባበሩት መንግሥታት ድርጅት ልዩ የሰላም ስብሰባ ላይ እንዲገኝ ተጋበዘ።")
Output:
[{'entity_group': 'TTL',
'score': 0.9841112,
'word': 'አትሌት',
'start': 0,
'end': 4},
{'entity_group': 'PER',
'score': 0.99379075,
'word': 'ኃይሌ ገ / ሥላሴ',
'start': 5,
'end': 14},
{'entity_group': 'LOC',
'score': 0.8818362,
'word': 'ኒውዮርክ',
'start': 15,
'end': 20},
{'entity_group': 'ORG',
'score': 0.99056435,
'word': 'የተባበሩት መንግሥታት ድርጅት',
'start': 32,
'end': 50}]
https://github.com/rasyosef/amharic-named-entity-recognition