--- license: apache-2.0 datasets: - kejian/ACL-ARC language: - en metrics: - f1 base_model: - Qwen/Qwen2.5-14B-Instruct library_name: transformers tags: - scientometrics - citation_analysis - citation_intent_classification pipeline_tag: zero-shot-classification --- # Qwen2.5-14B-CIC-ACLARC A fine-tuned model for Citation Intent Classification, based on [Qwen 2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) and trained on the [ACL-ARC](https://huggingface.co/datasets/kejian/ACL-ARC) dataset. GGUF Version: https://huggingface.co/sknow-lab/Qwen2.5-14B-CIC-ACLARC-GGUF ## ACL-ARC classes | Class | Description | | --- | --- | | Background | The cited paper provides relevant Background information or is part of the body of literature.| | Motivation | The citing paper is directly motivated by the cited paper. | | Uses | The citing paper uses the methodology or tools created by the cited paper.| | Extends | The citing paper extends the methods, tools or data, etc. of the cited paper. | | Comparison or Contrast | The citing paper expresses similarities or differences to, or disagrees with, the cited paper. | | Future | *The cited paper may be a potential avenue for future work.| ## Quickstart ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "sknow-lab/Qwen2.5-14B-CIC-ACLARC" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) system_prompt = """ # CONTEXT # You are an expert researcher tasked with classifying the intent of a citation in a scientific publication. ######## # OBJECTIVE # You will be given a sentence containing a citation, you must output the appropriate class as an answer. ######## # CLASS DEFINITIONS # The six (6) possible classes are the following: "BACKGROUND", "MOTIVATION", "USES", "EXTENDS", "COMPARES_CONTRASTS", "FUTURE". The definitions of the classes are: 1 - BACKGROUND: The cited paper provides relevant Background information or is part of the body of literature. 2 - MOTIVATION: The citing paper is directly motivated by the cited paper. 3 - USES: The citing paper uses the methodology or tools created by the cited paper. 4 - EXTENDS: The citing paper extends the methods, tools or data, etc. of the cited paper. 5 - COMPARES_CONTRASTS: The citing paper expresses similarities or differences to, or disagrees with, the cited paper. 6 - FUTURE: The cited paper may be a potential avenue for future work. ######## # RESPONSE RULES # - Analyze only the citation marked with the @@CITATION@@ tag. - Assign exactly one class to each citation. - Respond only with the exact name of one of the following classes: "BACKGROUND", "MOTIVATION", "USES", "EXTENDS", "COMPARES_CONTRASTS", "FUTURE". - Do not provide any explanation or elaboration. """ test_citing_sentence = "However , the method we are currently using in the ATIS domain ( @@CITATION@@ ) represents our most promising approach to this problem." user_prompt = f""" {test_citing_sentence} ### Question: Which is the most likely intent for this citation? a) BACKGROUND b) MOTIVATION c) USES d) EXTENDS e) COMPARES_CONTRASTS f) FUTURE ### Answer: """ messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] # Response: USES ``` Details about the system prompts and query templates can be found in the paper. There might be a need for a cleanup function to extract the predicted label from the output. You can find ours on [GitHub](https://github.com/athenarc/CitationIntentOpenLLM/blob/main/citation_intent_classification_experiments.py). ## Citation ``` @misc{koloveas2025llmspredictcitationintent, title={Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs}, author={Paris Koloveas and Serafeim Chatzopoulos and Thanasis Vergoulis and Christos Tryfonopoulos}, year={2025}, eprint={2502.14561}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.14561}, } ```