--- license: apache-2.0 datasets: - allenai/scicite language: - en metrics: - f1 base_model: - Qwen/Qwen2.5-14B-Instruct pipeline_tag: zero-shot-classification library_name: transformers tags: - scientometrics - citation_analysis - citation_intent_classification --- # Qwen2.5-14B-CIC-SciCite A fine-tuned model for Citation Intent Classification, based on [Qwen 2.5 14B Instruct](https://huggingface.co./Qwen/Qwen2.5-14B-Instruct) and trained on the [SciCite](https://huggingface.co./datasets/allenai/scicite) dataset. GGUF Version: https://huggingface.co./sknow-lab/Qwen2.5-14B-CIC-SciCite-GGUF ## SciCite classes | Class | Definition | | --- | --- | | Background information | The citation states, mentions, or points to the background information giving more context about a problem, concept, approach, topic, or importance of the problem in the field. | | Method | Making use of a method, tool, approach or dataset. | | Result comparison | Comparison of the paper’s results/findings with the results/findings of other work. | ## Quickstart ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "sknow-lab/Qwen2.5-14B-CIC-SciCite" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) system_prompt = """ # CONTEXT # You are an expert researcher tasked with classifying the intent of a citation in a scientific publication. ######## # OBJECTIVE # You will be given a sentence containing a citation. You must classify the intent of the citation by assigning it to one of three predefined classes. ######## # CLASS DEFINITIONS # The three (3) possible classes are the following: "background information", "method", "results comparison." 1 - background information: The citation states, mentions, or points to the background information giving more context about a problem, concept, approach, topic, or importance of the problem in the field. 2 - method: Making use of a method, tool, approach, or dataset. 3 - results comparison: Comparison of the paper’s results/findings with the results/findings of other work. ######## # RESPONSE RULES # - Analyze only the citation marked with the @@CITATION tag. - Assign exactly one class to each citation. - Respond only with the exact name of one of the following classes: "background information", "method", or "results comparison". - Do not provide any explanation or elaboration. """ test_citing_sentence = "Activated PBMC are the basis of the standard PBMC blast assay for HIV-1 neutralization, whereas the various GHOST and HeLa cell lines have all been used in neutralization assays @@CITATION@@." user_prompt = f""" {test_citing_sentence} ### Question: Which is the most likely intent for this citation? a) background information b) method c) results comparison ### Answer: """ messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] # Response: method ``` Details about the system prompts and query templates can be found in the paper. There might be a need for a cleanup function to extract the predicted label from the output. You can find ours on [GitHub](https://github.com/athenarc/CitationIntentOpenLLM/blob/main/citation_intent_classification_experiments.py). ## Citation ``` @misc{koloveas2025llmspredictcitationintent, title={Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs}, author={Paris Koloveas and Serafeim Chatzopoulos and Thanasis Vergoulis and Christos Tryfonopoulos}, year={2025}, eprint={2502.14561}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.14561}, } ```