dibo's picture
Update README.md
71516ee verified
---
library_name: peft
base_model: meta-llama/Llama-2-7b-hf
language:
- en
pipeline_tag: text-generation
tags:
- hate-speech
- explanation-generation
---
# Model Card for gllama-alarm-implicit-hate
**GLlama Alarm** is a suite of knowledge-Guided versions of Llama 2 instruction fine-tuned for non-binary abusive language detection and explanation generation tasks.
## Model Details
This version has been instruction fine-tuned on Implicit Hate Corpus for multi-class expressiveness detection and explanation generation (i.e., implicit hate speech, explicit hate speech, not hate) as well as on encyclopedic, commonsense and temporal linguistic knowledge.
### Model Description
- **Developed by:** Chiara Di Bonaventura, Lucia Siciliani, Pierpaolo Basile
- **Funded by:** The Alan Turing Institute, Fondazione FAIR
- **Language:** English
- **Finetuned from model:** meta-llama/Llama-2-7b-hf
### Model Sources
- **Paper:** https://kclpure.kcl.ac.uk/ws/portalfiles/portal/316198577/2025_COLING_from_detection_to_explanation.pdf
## Uses
**GLlama Alarm** is intended for research use in English, especially for NLP tasks in the domain of social media, which might contain offensive content.
Our suite can be used to **detect different levels of offensiveness and expressiveness of abusive language** (e.g. offensive comments, implicit hate speech, which has proven to be hard for many LLMs) and to **generate structured textual explanations** entailing why the text contains abusive language.
In any case, language models, including ours, can potentially be used for language generation in a harmful way. GLlama Alarm should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.
## Training Details
**GLlama Alarm** builds on top of the foundational model Llama 2 (7B), which is an auto-regressive language model that uses an optimized transformer architecture.
Llama 2 was trained on a mix of publicly available online data between January 2023 and July 2023. We select the base version of Llama 2, which has 7B parameters.
We instruction-funed Llama 2 on the following datasets: HateXplain and Implicit Hate Corpus, separately. This version is the one instruction fine-tuned on Implicit Hate Corpus.
These datasets contain publicly available data designed for hate speech detection, thus ensuring data privacy and protection.
To instruction fine-tune Llama 2, we created knowledge-guided prompts following our paradigm. The template is shown in Table 9 of the paper.
We instruction fine-tuned Llama 2 with 17k knowledge-guided prompts for HateXplain and Implicit Hate for 5 epochs, while setting the other parameters as suggested by [Taori et al., 2023](https://github.com/tatsu-lab/stanford_alpaca).
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
@inproceedings{dibonaventura2025gllama_alarm,
title={From Detection to Explanation: Effective Learning Strategies for LLMs in Online Abusive Language Research},
author={Di Bonaventura, Chiara and Siciliani, Lucia and Basile, Pierpaolo and Merono-Penuela, Albert and McGillivray, Barbara},
booktitle={Proceedings of the 2025 International Conference on Computational Linguistics (COLING 2025)},
year={2025}
}
**APA:**
Di Bonaventura, C., Siciliani, L., Basile, P., Merono-Penuela, A., & McGillivray, B. 2025. From Detection to Explanation: Effective Learning Strategies for LLMs in Online Abusive Language Research.
In Proceedings of the 2025 International Conference on Computational Linguistics (COLING 2025).
## Model Card Contact
chiara.di[email protected]
### Framework versions
- PEFT 0.10.0