File size: 4,244 Bytes
95146f9
 
 
 
 
 
 
 
 
 
ccf0c35
95146f9
 
 
51d24f4
 
6a9b2fb
 
9f39565
a2f922c
9f39565
 
 
651a0f4
9f39565
 
a239894
9f39565
 
 
 
 
 
79c16d4
 
a2f922c
03ec37a
a2f922c
03ec37a
a2f922c
 
 
 
 
 
03ec37a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79c16d4
9f39565
 
 
 
 
 
6a9b2fb
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
license: apache-2.0
datasets:
- allenai/scicite
language:
- en
metrics:
- f1
base_model:
- Qwen/Qwen2.5-14B-Instruct
pipeline_tag: zero-shot-classification
library_name: transformers
tags:
- scientometrics
- citation_analysis
- citation_intent_classification
---


# Qwen2.5-14B-CIC-SciCite

A fine-tuned model for Citation Intent Classification, based on [Qwen 2.5 14B Instruct](https://huggingface.co./Qwen/Qwen2.5-14B-Instruct) and trained on the [SciCite](https://huggingface.co./datasets/allenai/scicite) dataset.

GGUF Version: https://huggingface.co./sknow-lab/Qwen2.5-14B-CIC-SciCite-GGUF

## SciCite classes
| Class | Definition |
| --- | --- |
| Background information | The citation states, mentions, or points to the background information giving more context about a problem, concept, approach, topic, or importance of the problem in the field. |
| Method | Making use of a method, tool, approach or dataset. |
| Result comparison | Comparison of the paper’s results/findings with the results/findings of other work. |

## Quickstart

```python 
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "sknow-lab/Qwen2.5-14B-CIC-SciCite"

model = AutoModelForCausalLM.from_pretrained(
  model_name,
  torch_dtype="auto",
  device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

system_prompt = """
# CONTEXT #
You are an expert researcher tasked with classifying the intent of a citation in a scientific publication.

########

# OBJECTIVE # 
You will be given a sentence containing a citation. You must classify the intent of the citation by assigning it to one of three predefined classes.

########

# CLASS DEFINITIONS #
The three (3) possible classes are the following: "background information", "method", "results comparison."

1 - background information: The citation states, mentions, or points to the background information giving more context about a problem, concept, approach, topic, or importance of the problem in the field.
2 - method: Making use of a method, tool, approach, or dataset.
3 - results comparison: Comparison of the paper’s results/findings with the results/findings of other work.

########

# RESPONSE RULES #
- Analyze only the citation marked with the @@CITATION tag.
- Assign exactly one class to each citation.
- Respond only with the exact name of one of the following classes: "background information", "method", or "results comparison".
- Do not provide any explanation or elaboration.
"""

test_citing_sentence = "Activated PBMC are the basis of the standard PBMC blast assay for HIV-1 neutralization, whereas the various GHOST and HeLa cell lines have all been used in neutralization assays @@CITATION@@."

user_prompt = f"""
{test_citing_sentence}
### Question: Which is the most likely intent for this citation?
a) background information
b) method
c) results comparison 
### Answer:
"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
# Response: method
```

Details about the system prompts and query templates can be found in the paper. 

There might be a need for a cleanup function to extract the predicted label from the output. You can find ours on [GitHub](https://github.com/athenarc/CitationIntentOpenLLM/blob/main/citation_intent_classification_experiments.py). 

## Citation

```
@misc{koloveas2025llmspredictcitationintent,
      title={Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs}, 
      author={Paris Koloveas and Serafeim Chatzopoulos and Thanasis Vergoulis and Christos Tryfonopoulos},
      year={2025},
      eprint={2502.14561},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14561}, 
}
```