dougtrajano
/

toxicity-target-type-identification

Text Classification

offensive language

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

toxicity-target-type-identification / README.md

dougtrajano's picture

Update README.md

07d6a9a about 2 years ago

|

2.92 kB

	---
	license: mit
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	model-index:
	- name: toxicity-target-type-identification
	results: []
	datasets:
	- dougtrajano/olid-br
	language:
	- pt
	library_name: transformers
	---

	# toxicity-target-type-identification

	Toxicity Target Type Identification is a model that classifies the type (individual, group, or other) of a given targeted text.

	This BERT model is a fine-tuned version of [neuralmind/bert-base-portuguese-cased](https://huggingface.co./neuralmind/bert-base-portuguese-cased) on the [OLID-BR dataset](https://huggingface.co./datasets/dougtrajano/olid-br).

	## Overview

	Input: Text in Brazilian Portuguese

	Output: Multiclass classification (individual, group, or other)

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("dougtrajano/toxicity-target-type-identification")

	model = AutoModelForSequenceClassification.from_pretrained("dougtrajano/toxicity-target-type-identification")
	```

	## Limitations and bias

	The following factors may degrade the model’s performance.

	Text Language: The model was trained on Brazilian Portuguese texts, so it may not work well with Portuguese dialects.

	Text Origin: The model was trained on texts from social media and a few texts from other sources, so it may not work well on other types of texts.

	## Trade-offs

	Sometimes models exhibit performance issues under particular circumstances. In this section, we'll discuss situations in which you might discover that the model performs less than optimally, and should plan accordingly.

	Text Length: The model was fine-tuned on texts with a word count between 1 and 178 words (average of 18 words). It may give poor results on texts with a word count outside this range.

	## Performance

	The model was evaluated on the test set of the [OLID-BR](https://dougtrajano.github.io/olid-br/) dataset.

	Accuracy: 0.7505

	Precision: 0.7812

	Recall: 0.7505

	F1-Score: 0.7603

	\| Class \| Precision \| Recall \| F1-Score \| Support \|
	\| :---: \| :-------: \| :----: \| :------: \| :-----: \|
	\| `INDIVIDUAL` \| 0.8850 \| 0.7964 \| 0.8384 \| 609 \|
	\| `GROUP` \| 0.6766 \| 0.6385 \| 0.6570 \| 213 \|
	\| `OTHER` \| 0.4518 \| 0.7177 \| 0.5545 \| 124 \|

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:

	- learning_rate: 3.952388499692274e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 1993
	- optimizer: Adam with betas=(0.9944095815441554,0.8750000522553327) and epsilon=1.8526084265228802e-07
	- lr_scheduler_type: linear
	- num_epochs: 30

	### Framework versions

	- Transformers 4.26.1
	- Pytorch 1.10.2+cu113
	- Datasets 2.9.0
	- Tokenizers 0.13.2

	## Provide Feedback

	If you have any feedback on this model, please [open an issue](https://github.com/DougTrajano/ToChiquinho/issues/new) on GitHub.