metadata
library_name: transformers
license: mit
datasets:
- coltekin/offenseval2020_tr
language:
- tr
pipeline_tag: text-classification
atasoglu/turkish-base-bert-uncased-offenseval2020_tr
This is an offensive language detection model fine-tuned with coltekin/offenseval2020_tr dataset on ytu-ce-cosmos/turkish-base-bert-uncased.
Usage
Quick usage:
from transformers import pipeline
pipe = pipeline("text-classification", "atasoglu/turkish-base-bert-uncased-offenseval2020_tr")
print(pipe("bu bir test metnidir.", top_k=None))
# [{'label': 'NOT', 'score': 0.9970345497131348}, {'label': 'OFF', 'score': 0.0029654440004378557}]
Or:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_id = "atasoglu/turkish-base-bert-uncased-offenseval2020_tr"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id).to(device)
@torch.no_grad
def predict(X):
inputs = tokenizer(X, padding="max_length", truncation=True, max_length=256, return_tensors="pt")
outputs = model.forward(**inputs.to(device))
return torch.argmax(outputs.logits, dim=-1).tolist()
print(predict(["bu bir test metnidir."]))
# [0]
Test Results
Test results examined on the test split of fine-tuning dataset.
precision | recall | f1-score | support | |
---|---|---|---|---|
NOT | 0.9162 | 0.9559 | 0.9356 | 2812 |
OFF | 0.7912 | 0.6564 | 0.7176 | 716 |
accuracy | 0.8951 | 3528 | ||
macro avg | 0.8537 | 0.8062 | 0.8266 | 3528 |
weighted avg | 0.8908 | 0.8951 | 0.8914 | 3528 |