Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,36 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
# SciMult
|
6 |
+
|
7 |
+
SciMult is a pre-trained language model for scientific literature understanding. It is pre-trained on data from (extreme multi-label) paper classification, citation prediction, and literature retrieval tasks via a multi-task contrastive learning framework. For more details, please refer to the [paper](https://arxiv.org/abs/2305.14232).
|
8 |
+
|
9 |
+
We release four variants of SciMult here:
|
10 |
+
**scimult_vanilla.ckpt**
|
11 |
+
**scimult_moe.ckpt**
|
12 |
+
**scimult_moe_pmcpatients_par.ckpt**
|
13 |
+
**scimult_moe_pmcpatients_ppr.ckpt**
|
14 |
+
|
15 |
+
**scimult_vanilla.ckpt** and **scimult_moe.ckpt** can be used for various scientific literature understanding tasks in general. Their difference is that **scimult_vanilla.ckpt** adopts a typical 12-layer Transformer architecture (i.e., the same as [BERT base](https://huggingface.co/bert-base-uncased)), whereas **scimult_moe.ckpt** adopts a Mixture-of-Experts Transformer architecture with task-specific multi-head attention (MHA) sublayers. Experimental results show that **scimult_moe.ckpt** achieves better performance in general.
|
16 |
+
|
17 |
+
**scimult_moe_pmcpatients_par.ckpt** and **scimult_moe_pmcpatients_ppr.ckpt** are initialized from **scimult_moe.ckpt** and continuously pre-trained on the training sets of [PMC-Patients](https://github.com/pmc-patients/pmc-patients) patient-to-article retrieval and patient-to-patient retrieval tasks, respectively. As of October 2023, these two models rank 1st in their corresponding tasks on the [PMC-Patients Leaderboard](https://pmc-patients.github.io/).
|
18 |
+
|
19 |
+
|
20 |
+
## Pre-training Data
|
21 |
+
SciMult is pre-trained on the following data:
|
22 |
+
[MAPLE](https://github.com/yuzhimanhua/MAPLE) for paper classification
|
23 |
+
[Citation Prediction Triplets](https://huggingface.co/datasets/allenai/scirepeval/viewer/cite_prediction) for link prediction
|
24 |
+
[SciRepEval-Search](https://huggingface.co/datasets/allenai/scirepeval/viewer/search) for literature retrieval
|
25 |
+
|
26 |
+
|
27 |
+
## Citation
|
28 |
+
If you find SciMult useful in your research, please cite the following paper:
|
29 |
+
```
|
30 |
+
@article{zhang2023pre,
|
31 |
+
title={Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding},
|
32 |
+
author={Zhang, Yu and Cheng, Hao and Shen, Zhihong and Liu, Xiaodong and Wang, Ye-Yi and Gao, Jianfeng},
|
33 |
+
journal={arXiv preprint arXiv:2305.14232},
|
34 |
+
year={2023}
|
35 |
+
}
|
36 |
+
```
|