m3hrdadfi commited on
Commit
2c1d19d
·
1 Parent(s): dd56fb8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fa
3
+ license: apache-2.0
4
+ ---
5
+
6
+ # ParsBERT (v2.0)
7
+ A Transformer-based Model for Persian Language Understanding
8
+
9
+ We reconstructed the vocabulary and fine-tuned the ParsBERT v1.1 on the new Persian corpora in order to provide some functionalities for using ParsBERT in other scopes!
10
+ Please follow the [ParsBERT](https://github.com/hooshvare/parsbert) repo for the latest information about previous and current models.
11
+
12
+
13
+ ## Persian NER [ARMAN, PEYMA]
14
+
15
+ This task aims to extract named entities in the text, such as names and label with appropriate `NER` classes such as locations, organizations, etc. The datasets used for this task contain sentences that are marked with `IOB` format. In this format, tokens that are not part of an entity are tagged as `”O”` the `”B”`tag corresponds to the first word of an object, and the `”I”` tag corresponds to the rest of the terms of the same entity. Both `”B”` and `”I”` tags are followed by a hyphen (or underscore), followed by the entity category. Therefore, the NER task is a multi-class token classification problem that labels the tokens upon being fed a raw text. There are two primary datasets used in Persian NER, `ARMAN`, and `PEYMA`.
16
+
17
+
18
+ ### PEYMA
19
+
20
+ PEYMA dataset includes 7,145 sentences with a total of 302,530 tokens from which 41,148 tokens are tagged with seven different classes.
21
+
22
+ 1. Organization
23
+ 2. Money
24
+ 3. Location
25
+ 4. Date
26
+ 5. Time
27
+ 6. Person
28
+ 7. Percent
29
+
30
+
31
+ | Label | # |
32
+ |:------------:|:-----:|
33
+ | Organization | 16964 |
34
+ | Money | 2037 |
35
+ | Location | 8782 |
36
+ | Date | 4259 |
37
+ | Time | 732 |
38
+ | Person | 7675 |
39
+ | Percent | 699 |
40
+
41
+
42
+ **Download**
43
+ You can download the dataset from [here](http://nsurl.org/tasks/task-7-named-entity-recognition-ner-for-farsi/)
44
+
45
+ ## Results
46
+
47
+ The following table summarizes the F1 score obtained by ParsBERT as compared to other models and architectures.
48
+
49
+ | Dataset | ParsBERT v2 | ParsBERT v1 | mBERT | MorphoBERT | Beheshti-NER | LSTM-CRF | Rule-Based CRF | BiLSTM-CRF |
50
+ |---------|-------------|-------------|-------|------------|--------------|----------|----------------|------------|
51
+ | PEYMA | 93.40* | 93.10 | 86.64 | - | 90.59 | - | 84.00 | - |
52
+
53
+
54
+ ## How to use :hugs:
55
+
56
+ | Notebook | Description | |
57
+ |:----------|:-------------|------:|
58
+ | [How to use Pipelines](https://github.com/hooshvare/parsbert-ner/blob/master/persian-ner-pipeline.ipynb) | Simple and efficient way to use State-of-the-Art models on downstream tasks through transformers | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hooshvare/parsbert-ner/blob/master/persian-ner-pipeline.ipynb) |
59
+
60
+
61
+ ### BibTeX entry and citation info
62
+
63
+ Please cite in publications as the following:
64
+
65
+ ```bibtex
66
+ @article{ParsBERT,
67
+ title={ParsBERT: Transformer-based Model for Persian Language Understanding},
68
+ author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
69
+ journal={ArXiv},
70
+ year={2020},
71
+ volume={abs/2005.12515}
72
+ }
73
+ ```
74
+
75
+ ## Questions?
76
+ Post a Github issue on the [ParsBERT Issues](https://github.com/hooshvare/parsbert/issues) repo.