Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- adsabs/WIESP2022-NER
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
tags:
|
7 |
+
- physics
|
8 |
+
- computer science
|
9 |
+
---
|
10 |
+
|
11 |
+
PCSciBERT_uncased was initiated with the uncased variant of SciBERT (https://huggingface.co/allenai/scibert_scivocab_uncased) and pre-trained on texts from 1,560,661 research articles of the physics and computer science domain in arXiv. The tokenizer for PCSciBERT_uncased uses the same vocabulary from allenai/scibert_scivocab_uncased.
|
12 |
+
|
13 |
+
The model was also evaluated on its downstream performance in named entity recognition using the adsabs/WIESP2022-NER and CS-NER (https://github.com/jd-coderepos/contributions-ner-cs/tree/main) dataset. Overall, PCSciBERT_cased achieved higher micro F1 scores for both WIESP (Micro F1: 81.54%) and CS-NER (Micro F1: 75.67%) datasets.
|
14 |
+
|
15 |
+
It improves the performance of SciBERT(uncased) on CS-NER test dataset by 0.26% and on WIESP test dataset by 0.8%.
|