GPT2-against-hate / README.md

Update README.md

1644c1a verified 6 months ago

5.13 kB

	---
	license: cc-by-nc-4.0
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- counter speech
	base_model: openai-community/gpt2-medium
	---

	---

	# Target-Aware Counter-Speech Generation

	<!-- Provide a quick summary of what the model is/does. -->

	The target-aware counter-speech generation model is an autoregressive generative language model fine-tuned on hate- and counter-speech pairs from the [CONAN](https://github.com/marcoguerini/CONAN) datasets for generating more contextually relevant counter-speech, based on the [gpt2-medium](https://huggingface.co./gpt2-medium) model.
	The model utilizes special tokens that embedded target demographic information to guide the generation towards more relevant responses, avoiding off-topic and generic responses. The model is trained on 8 target demographics, including Migrants, People of Color (POC), LGBT+, Muslims, Women, Jews, Disabled, and Other.

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	The model is intended for generating counter-speech responses for a given hate speech sequence, combined with special tokens for target-demographic embeddings.


	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	We observed negative effects such as content hallucination and toxic response generation. Though the intended use is to generate counter-speech for combating online hatred, the usage is to be monitored carefully with human post-editing or approval system, ensuring safe and inclusive online environment.


	## How to Get Started with the Model

	Use the code below to get started with the model.



	types = ["MIGRANTS", "POC", "LGBT+", "MUSLIMS", "WOMEN", "JEWS", "other", "DISABLED"] # A list of all available target-demographic tokens
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model = AutoModelForCausalLM.from_pretrained(tum-nlp/gpt-2-medium-target-aware-counterspeech-generation)
	tokenizer = AutoTokenizer.from_pretrained(tum-nlp/gpt-2-medium-target-aware-counterspeech-generation)
	tokenizer.padding_side = "left"

	prompt = "<\|endoftext\|> <other> Hate-speech: Human are not created equal, some are born lesser. Counter-speech: "
	input = tokenizer(prompt, return_tensors="pt", padding=True)
	output_sequences = model.generate(
	input_ids=inputs['input_ids'].to(model.device),
	attention_mask=inputs['attention_mask'].to(model.device),
	pad_token_id=tokenizer.eos_token_id,
	max_length=128,
	num_beams=3,
	no_repeat_ngram_size=3,
	num_return_sequences=1,
	early_stopping=True
	)
	result = tokenizer.decode(output_sequences, skip_special_tokens=True)


	#### Training Hyperparameters

	training_args = TrainingArguments(
	num_train_epochs=20,
	learning_rate=3.800568576836524e-05,
	weight_decay=0.050977894796868116,
	warmup_ratio=0.10816909354342182,
	optim="adamw_torch",
	lr_scheduler_type="cosine",
	evaluation_strategy="epoch",
	save_strategy="epoch",
	save_total_limit=3,
	load_best_model_at_end=True,
	auto_find_batch_size=True,
	)


	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Data Card if possible. -->

	The model's performance is tested on three test sets, from which two are subsets of the [CONAN](https://github.com/marcoguerini/CONAN) dataset and one is the sexist portion of the [EDOS](https://github.com/rewire-online/edos) dataset

	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	The model's performance is tested on a custom evaluation pipeline for counter-speech generation. The pipeline includes CoLA, Toxicity, Hatefulness, Offensiveness, Label and Context Similarity, Validity as Counter-Speech, Repetition Rate, target-demographic F1 and the Arithmetic Mean


	### Results
	CONAN
	\| Model Name \| CoLA \|TOX \| Hate \| OFF \| L Sim \| C Sim \| VaCS \| RR \| F1 \| AM \|
	\| ---------- \| ---- \| -- \| ---- \| --- \| ----- \| ----- \| ---- \| -- \| -- \| -- \|
	\| Human \| 0.937 \| 0.955 \| 1.000 \| 0.997 \| - \| 0.751 \| 0.980 \| 0.861 \| 0.885 \| 0.929 \|
	\| target-aware gpt2-medium \| 0.958 \| 0.946 \| 1.000 \| 0.996 \| 0.706 \| 0.784 \| 0.946 \| 0.419 \| 0.880 \| 0.848 \|

	CONAN SMALL
	\| Model Name \| CoLA \|TOX \| Hate \| OFF \| L Sim \| C Sim \| VaCS \| RR \| F1 \| AM \|
	\| ---------- \| ---- \| -- \| ---- \| --- \| ----- \| ----- \| ---- \| -- \| -- \| -- \|
	\| Human \| 0.963 \| 0.956 \| 1.000 \| 1.000 \| 1.000 \| 0.768 \| 0.988 \| 0.995 \| 0.868 \| 0.949 \|
	\| target-aware gpt2-medium \| 0.975 \| 0.931 \| 1.000 \| 1.000 \| 0.728 \| 0.783 \| 0.888 \| 0.911 \| 0.792 \| 0.890 \|

	EDOS
	\| Model Name \| CoLA \|TOX \| Hate \| OFF \| C Sim \| VaCS \| RR \| F1 \| AM \|
	\| ---------- \| ---- \| -- \| ---- \| --- \| ----- \| ---- \| -- \| -- \| -- \|
	\| target-aware gpt2-medium \| 0.930 \| 0.815 \| 0.999 \| 0.975 \| 0.689 \| 0.857 \| 0.518 \| 0.747 \| 0.816\|