File size: 3,848 Bytes
fdb0f61 581261f c69a9b0 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f fdb0f61 9cba09a fdb0f61 581261f fdb0f61 581261f fdb0f61 581261f 9cba09a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
---
library_name: transformers
tags:
- citation
- text-classification
- science
license: apache-2.0
language:
- af
- am
- ar
- as
- az
- be
- bg
- bn
- br
- bs
- ca
- cs
- cy
- da
- de
- el
- en
- eo
- es
- et
- eu
- fa
- fi
- fr
- fy
- ga
- gd
- gl
- gu
- ha
- he
- hi
- hr
- hu
- hy
- id
- is
- it
- ja
- jv
- ka
- kk
- km
- kn
- ko
- ku
- ky
- la
- lo
- lt
- lv
- mg
- mk
- ml
- mn
- mr
- ms
- my
- ne
- nl
- 'no'
- om
- or
- pa
- pl
- ps
- pt
- ro
- ru
- sa
- sd
- si
- sk
- sl
- so
- sq
- sr
- su
- sv
- sw
- ta
- te
- th
- tl
- tr
- ug
- uk
- ur
- uz
- vi
- xh
- yi
- zh
base_model:
- distilbert/distilbert-base-multilingual-cased
---
# Citation Pre-Screening
<!-- Provide a quick summary of what the model is/does. -->
## Overview
<details>
<summary>Click to expand</summary>
- **Model type:** Language Model
- **Architecture:** DistilBERT
- **Language:** Multilingual
- **License:** Apache 2.0
- **Task:** Binary Classification (Citation Pre-Screening)
- **Dataset:** SIRIS-Lab/citation-parser-TYPE
- **Additional Resources:**
- [GitHub](https://github.com/sirisacademic/citation-parser)
</details>
## Model description
The **Citation Pre-Screening** model is part of the [`Citation Parser`](https://github.com/sirisacademic/citation-parser) package and is fine-tuned for classifying citation texts as valid or invalid. This model, based on **DistilBERT**, is specifically designed for automated citation processing workflows, making it an essential component of the **Citation Parser** tool for citation metadata extraction and validation.
The model was trained on a dataset containing citation texts, with the labels `True` (valid citation) and `False` (invalid citation). The dataset contains 3599 training samples and 400 test samples, with each example consisting of citation-related text and a corresponding label.
The fine-tuning process was done with the **DistilBERT-base-multilingual-cased** architecture, making the model capable of handling multilingual text, but it was evaluated on English citation data.
## Intended Usage
This model is intended to classify raw citation text as either a valid or invalid citation based on the provided input. It is ideal for automating the pre-screening process in citation databases or manuscript workflows.
## How to use
```python
from transformers import pipeline
# Load the model
citation_classifier = pipeline("text-classification", model="sirisacademic/citation-pre-screening")
# Example citation text
citation_text = "MURAKAMI, H等: 'Unique thermal behavior of acrylic PSAs bearing long alkyl side groups and crosslinked by aluminum chelate', 《EUROPEAN POLYMER JOURNAL》"
# Classify the citation
result = citation_classifier(citation_text)
print(result)
```
## Training
The model was trained using the **Citation Pre-Screening Dataset** consisting of:
- **Training data**: 3599 samples
- **Test data**: 400 samples
The following hyperparameters were used for training:
- **Model Path**: `distilbert/distilbert-base-multilingual-cased`
- **Batch Size**: 32
- **Number of Epochs**: 4
- **Learning Rate**: 2e-5
- **Max Sequence Length**: 512
## Evaluation Metrics
The model's performance was evaluated on the test set, and the following results were obtained:
| Metric | Value |
|----------------------|--------|
| **Accuracy** | 0.95 |
| **Macro avg F1** | 0.94 |
| **Weighted avg F1** | 0.95 |
## Additional information
### Authors
- SIRIS Lab, Research Division of SIRIS Academic.
### License
This work is distributed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
### Contact
For further information, send an email to either [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected]).
|