jmzk96 commited on
Commit
8718b82
·
1 Parent(s): 6dfe00d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - adsabs/WIESP2022-NER
4
+ language:
5
+ - en
6
+ tags:
7
+ - physics
8
+ - computer science
9
+ ---
10
+
11
+ PCSciBERT_uncased was initiated with the uncased variant of SciBERT (https://huggingface.co/allenai/scibert_scivocab_uncased) and pre-trained on texts from 1,560,661 research articles of the physics and computer science domain in arXiv. The tokenizer for PCSciBERT_uncased uses the same vocabulary from allenai/scibert_scivocab_uncased.
12
+
13
+ The model was also evaluated on its downstream performance in named entity recognition using the adsabs/WIESP2022-NER and CS-NER (https://github.com/jd-coderepos/contributions-ner-cs/tree/main) dataset. Overall, PCSciBERT_cased achieved higher micro F1 scores for both WIESP (Micro F1: 81.54%) and CS-NER (Micro F1: 75.67%) datasets.
14
+
15
+ It improves the performance of SciBERT(uncased) on CS-NER test dataset by 0.26% and on WIESP test dataset by 0.8%.