File size: 4,591 Bytes
52e89e0
 
 
 
 
 
 
 
 
 
 
 
 
dc53f13
 
 
7b53a0f
 
102070c
45c8bc4
 
 
f6314ce
45c8bc4
 
8935117
45c8bc4
f31b931
 
 
102070c
f31b931
 
45c8bc4
 
5286ed9
 
102070c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5286ed9
45c8bc4
 
 
 
 
 
7b53a0f
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
license: apache-2.0
datasets:
- kejian/ACL-ARC
language:
- en
metrics:
- f1
base_model:
- Qwen/Qwen2.5-14B-Instruct
library_name: transformers
tags:
- scientometrics
- citation_analysis
- citation_intent_classification
pipeline_tag: zero-shot-classification
---

# Qwen2.5-14B-CIC-ACLARC

A fine-tuned model for Citation Intent Classification, based on [Qwen 2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) and trained on the [ACL-ARC](https://huggingface.co/datasets/kejian/ACL-ARC) dataset.

GGUF Version: https://huggingface.co/sknow-lab/Qwen2.5-14B-CIC-ACLARC-GGUF

## ACL-ARC classes
| Class | Description |
| --- | --- |
| Background | The cited paper provides relevant Background information or is part of the body of literature.|
| Motivation | The citing paper is directly motivated by the cited paper. |
| Uses | The citing paper uses the methodology or tools created by the cited paper.|
| Extends | The citing paper extends the methods, tools or data, etc. of the cited paper. |
| Comparison or Contrast | The citing paper expresses similarities or differences to, or disagrees with, the cited paper. |
| Future | *The cited paper may be a potential avenue for future work.|

## Quickstart

```python 
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "sknow-lab/Qwen2.5-14B-CIC-ACLARC"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

system_prompt = """
# CONTEXT #
You are an expert researcher tasked with classifying the intent of a citation in a scientific publication.

########

# OBJECTIVE # 
You will be given a sentence containing a citation, you must output the appropriate class as an answer.

########

# CLASS DEFINITIONS #

The six (6) possible classes are the following: "BACKGROUND", "MOTIVATION", "USES", "EXTENDS", "COMPARES_CONTRASTS", "FUTURE".

The definitions of the classes are:
1 - BACKGROUND: The cited paper provides relevant Background information or is part of the body of literature.
2 - MOTIVATION: The citing paper is directly motivated by the cited paper.
3 - USES: The citing paper uses the methodology or tools created by the cited paper.
4 - EXTENDS: The citing paper extends the methods, tools or data, etc. of the cited paper.
5 - COMPARES_CONTRASTS: The citing paper expresses similarities or differences to, or disagrees with, the cited paper.
6 - FUTURE: The cited paper may be a potential avenue for future work.

########

# RESPONSE RULES #
- Analyze only the citation marked with the @@CITATION@@ tag.
- Assign exactly one class to each citation.
- Respond only with the exact name of one of the following classes: "BACKGROUND", "MOTIVATION", "USES", "EXTENDS", "COMPARES_CONTRASTS", "FUTURE".
- Do not provide any explanation or elaboration.
"""

test_citing_sentence = "However , the method we are currently using in the ATIS domain ( @@CITATION@@ ) represents our most promising approach to this problem."

user_prompt = f"""
{test_citing_sentence}
### Question: Which is the most likely intent for this citation?
a) BACKGROUND
b) MOTIVATION
c) USES
d) EXTENDS
e) COMPARES_CONTRASTS
f) FUTURE
### Answer:
"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
# Response: USES
```

Details about the system prompts and query templates can be found in the paper. 

There might be a need for a cleanup function to extract the predicted label from the output. You can find ours on [GitHub](https://github.com/athenarc/CitationIntentOpenLLM/blob/main/citation_intent_classification_experiments.py). 

## Citation

```
@misc{koloveas2025llmspredictcitationintent,
      title={Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs}, 
      author={Paris Koloveas and Serafeim Chatzopoulos and Thanasis Vergoulis and Christos Tryfonopoulos},
      year={2025},
      eprint={2502.14561},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14561}, 
}
```