marianaossilva commited on
Commit
30b438d
·
verified ·
1 Parent(s): a3d069a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -3
README.md CHANGED
@@ -1,3 +1,83 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - pt
5
+ metrics:
6
+ - name: Precision
7
+ type: Precision
8
+ value: 0.783
9
+ - name: Recall
10
+ type: Recall
11
+ value: 0.774
12
+ - name: F1-Score
13
+ type: F1-Score
14
+ value: 0.779
15
+ library_name: transformers
16
+ pipeline_tag: token-classification
17
+ tags:
18
+ - BERT
19
+ - CRF
20
+ - NER
21
+ - Portuguese
22
+ - Literature
23
+ ---
24
+
25
+ # LitBERT-CRF
26
+
27
+ <!-- Provide a quick summary of what the model is/does. -->
28
+
29
+ LitBERT-CRF model is a fine-tuned BERT-CRF architecture specifically designed for Named Entity Recognition (NER) in Portuguese-written literature.
30
+
31
+ ## Model Details
32
+
33
+ ### Model Description
34
+
35
+ LitBERT-CRF leverages a BERT-CRF architecture, initially pre-trained on the brWaC corpus and fine-tuned on the HAREM dataset for enhanced NER performance in Portuguese.
36
+ It incorporates domain-specific literary data through Masked Language Modeling (MLM), making it well-suited for identifying named entities in literary texts.
37
+
38
+ - **Model type:** BERT-CRF for NER
39
+ - **Language:** Portuguese
40
+ - **Fine-tuned from model:** BERT-CRF on brWaC and HAREM
41
+
42
+ ## Evaluation
43
+
44
+ ### Testing Data, Factors & Metrics
45
+
46
+ #### Testing Data
47
+
48
+ PPORTAL_ner dataset
49
+
50
+ #### Metrics
51
+
52
+ - **Precision**: 0.783
53
+ - **Recall**: 0.774
54
+ - **F1-score**: 0.779
55
+
56
+ ## Citation
57
+
58
+ **BibTeX:**
59
+ ```
60
+ @inproceedings{silva-moro-2024-evaluating,
61
+ title = "Evaluating Pre-training Strategies for Literary Named Entity Recognition in {P}ortuguese",
62
+ author = "Silva, Mariana O. and
63
+ Moro, Mirella M.",
64
+ editor = "Gamallo, Pablo and
65
+ Claro, Daniela and
66
+ Teixeira, Ant{\'o}nio and
67
+ Real, Livy and
68
+ Garcia, Marcos and
69
+ Oliveira, Hugo Gon{\c{c}}alo and
70
+ Amaro, Raquel",
71
+ booktitle = "Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1",
72
+ month = mar,
73
+ year = "2024",
74
+ address = "Santiago de Compostela, Galicia/Spain",
75
+ publisher = "Association for Computational Lingustics",
76
+ url = "https://aclanthology.org/2024.propor-1.39",
77
+ pages = "384--393",
78
+ }
79
+ ```
80
+
81
+ **APA:**
82
+
83
+ Mariana O. Silva and Mirella M. Moro. 2024. Evaluating Pre-training Strategies for Literary Named Entity Recognition in Portuguese. In Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, pages 384–393, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.