|
---
|
|
tags:
|
|
- flair
|
|
- token-classification
|
|
- sequence-tagger-model
|
|
language:
|
|
- id
|
|
---
|
|
|
|
## English NER in Flair (default model)
|
|
|
|
This is the POS model for Indonesian that ships with [Flair](https://github.com/flairNLP/flair/). The architecture of this model uses **FastText**.
|
|
|
|
- F-score (micro) = **0.9345**
|
|
- F-score (macro) = **0.8735**
|
|
- Accuracy = **0.9345**
|
|
|
|
Predicts 19 tags:
|
|
|
|
| **Tag** | **Meaning** |
|
|
|----------|-----------------------------------|
|
|
| NOUN | Noun (person, place, thing, or idea) |
|
|
| PROPN | Proper noun (specific name) |
|
|
| PUNCT | Punctuation (marks like commas, periods, etc.) |
|
|
| VERB | Verb (action or state) |
|
|
| ADP | Adposition (prepositions or postpositions) |
|
|
| PRON | Pronoun (substitute for a noun) |
|
|
| ADJ | Adjective (describes a noun) |
|
|
| NUM | Numeral (number or quantity) |
|
|
| DET | Determiner (a word that modifies a noun) |
|
|
| CCONJ | Coordinating conjunction (joins clauses or words) |
|
|
| ADV | Adverb (modifies a verb, adjective, or another adverb) |
|
|
| AUX | Auxiliary verb (helps the main verb) |
|
|
| SCONJ | Subordinating conjunction (introduces subordinate clauses) |
|
|
| PART | Particle (small word that doesn’t change in form, e.g., "not") |
|
|
| SYM | Symbol (mathematical or other special symbols) |
|
|
| X | Other (words that don't fit standard POS categories) |
|
|
| INTJ | Interjection (expresses strong emotion or reaction) |
|
|
|
|
---
|
|
|
|
### Demo: How to use in Flair
|
|
|
|
Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`).
|
|
|
|
You also need to download the **model** file locally to use it.
|
|
|
|
You can find training or fine-tuning code here : https://github.com/bwbayu/product_name_clustering/blob/main/additional/train_pos_flair.ipynb
|
|
|
|
```python
|
|
from flair.data import Sentence
|
|
from flair.models import SequenceTagger
|
|
|
|
tagger = SequenceTagger.load("model")
|
|
text = "aku pergi ke pasar"
|
|
sentence = Sentence(text)
|
|
tagger.predict(sentence)
|
|
for token in sentence:
|
|
print(f"{token.text} ({token.get_label('upos').value})")
|
|
|
|
```
|
|
|
|
This yields the following output:
|
|
```
|
|
aku (PRON)
|
|
pergi (VERB)
|
|
ke (ADP)
|
|
pasar (NOUN)
|
|
```
|
|
|
|
---
|
|
|
|
### Cite
|
|
|
|
Please cite the following paper when using this model.
|
|
|
|
```
|
|
@inproceedings{akbik2018coling,
|
|
title={Contextual String Embeddings for Sequence Labeling},
|
|
author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
|
|
booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
|
|
pages = {1638--1649},
|
|
year = {2018}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### Issues?
|
|
|
|
The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/). |