|
--- |
|
library_name: peft |
|
base_model: meta-llama/Llama-2-7b-hf |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- hate-speech |
|
- explanation-generation |
|
--- |
|
|
|
# Model Card for gllama-alarm-implicit-hate |
|
|
|
**GLlama Alarm** is a suite of knowledge-Guided versions of Llama 2 instruction fine-tuned for non-binary abusive language detection and explanation generation tasks. |
|
|
|
|
|
## Model Details |
|
|
|
This version has been instruction fine-tuned on Implicit Hate Corpus for multi-class expressiveness detection and explanation generation (i.e., implicit hate speech, explicit hate speech, not hate) as well as on encyclopedic, commonsense and temporal linguistic knowledge. |
|
|
|
### Model Description |
|
|
|
- **Developed by:** Chiara Di Bonaventura, Lucia Siciliani, Pierpaolo Basile |
|
- **Funded by:** The Alan Turing Institute, Fondazione FAIR |
|
- **Language:** English |
|
- **Finetuned from model:** meta-llama/Llama-2-7b-hf |
|
|
|
### Model Sources |
|
|
|
- **Paper:** https://kclpure.kcl.ac.uk/ws/portalfiles/portal/316198577/2025_COLING_from_detection_to_explanation.pdf |
|
|
|
|
|
## Uses |
|
|
|
**GLlama Alarm** is intended for research use in English, especially for NLP tasks in the domain of social media, which might contain offensive content. |
|
Our suite can be used to **detect different levels of offensiveness and expressiveness of abusive language** (e.g. offensive comments, implicit hate speech, which has proven to be hard for many LLMs) and to **generate structured textual explanations** entailing why the text contains abusive language. |
|
|
|
In any case, language models, including ours, can potentially be used for language generation in a harmful way. GLlama Alarm should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application. |
|
|
|
|
|
## Training Details |
|
|
|
**GLlama Alarm** builds on top of the foundational model Llama 2 (7B), which is an auto-regressive language model that uses an optimized transformer architecture. |
|
Llama 2 was trained on a mix of publicly available online data between January 2023 and July 2023. We select the base version of Llama 2, which has 7B parameters. |
|
We instruction-funed Llama 2 on the following datasets: HateXplain and Implicit Hate Corpus, separately. This version is the one instruction fine-tuned on Implicit Hate Corpus. |
|
These datasets contain publicly available data designed for hate speech detection, thus ensuring data privacy and protection. |
|
To instruction fine-tune Llama 2, we created knowledge-guided prompts following our paradigm. The template is shown in Table 9 of the paper. |
|
We instruction fine-tuned Llama 2 with 17k knowledge-guided prompts for HateXplain and Implicit Hate for 5 epochs, while setting the other parameters as suggested by [Taori et al., 2023](https://github.com/tatsu-lab/stanford_alpaca). |
|
|
|
|
|
## Citation |
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
**BibTeX:** |
|
|
|
@inproceedings{dibonaventura2025gllama_alarm, |
|
|
|
title={From Detection to Explanation: Effective Learning Strategies for LLMs in Online Abusive Language Research}, |
|
|
|
author={Di Bonaventura, Chiara and Siciliani, Lucia and Basile, Pierpaolo and Merono-Penuela, Albert and McGillivray, Barbara}, |
|
|
|
booktitle={Proceedings of the 2025 International Conference on Computational Linguistics (COLING 2025)}, |
|
|
|
year={2025} |
|
} |
|
|
|
|
|
|
|
**APA:** |
|
|
|
Di Bonaventura, C., Siciliani, L., Basile, P., Merono-Penuela, A., & McGillivray, B. 2025. From Detection to Explanation: Effective Learning Strategies for LLMs in Online Abusive Language Research. |
|
In Proceedings of the 2025 International Conference on Computational Linguistics (COLING 2025). |
|
|
|
|
|
## Model Card Contact |
|
|
|
chiara.di[email protected] |
|
|
|
### Framework versions |
|
|
|
- PEFT 0.10.0 |