roberta-large Image Prompt Classifier
Model Overview
This model is a fine-tuned version of roberta-large
designed specifically for classifying image generation prompts into three distinct categories: SAFE, QUESTIONABLE, and UNSAFE. Leveraging the robust capabilities of the roberta-large
architecture, this model ensures high accuracy and reliability in identifying the nature of prompts used for generating images.
Model Details
- Model Name: roberta-large Image Prompt Classifier
- Base Model: roberta-large
- Fine-tuned By: Michał Młodawski
- Categories:
0
: SAFE1
: QUESTIONABLE2
: UNSAFE
Use Cases
This model is particularly useful for platforms and applications involving AI-generated content, where it is crucial to filter and classify prompts to maintain content safety and appropriateness. Some potential applications include:
- Content Moderation: Automatically classify and filter prompts to prevent the generation of inappropriate or harmful images.
- User Safety: Enhance user experience by ensuring that generated content adheres to safety guidelines.
- Compliance: Help platforms comply with regulatory requirements by identifying and flagging potentially unsafe prompts.
How It Works
The model takes an input prompt and classifies it into one of three categories:
- SAFE: Prompts that are deemed appropriate and free from harmful content.
- QUESTIONABLE: Prompts that may require further review due to potential ambiguity or slight risk.
- UNSAFE: Prompts that are likely to generate inappropriate or harmful content.
The classification is based on the semantic understanding and contextual analysis provided by the roberta-large
architecture, fine-tuned on a curated dataset tailored for this specific task.
Performance
Extensive testing and validation have been conducted to ensure the model's performance meets high standards of accuracy and reliability. Key metrics include:
- Accuracy: 93%
- Precision: 88%
- Recall: 90%
These metrics highlight the model's capability to effectively distinguish between different categories of prompts with high confidence.
Getting Started
To start using the roberta-large Image Prompt Classifier, you can easily integrate it into your existing workflows with the following steps:
Installation
pip install transformers==4.37.2
pip install torch==2.3.1
Usage
import os
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_repo = "MichalMlodawski/nsfw-text-detection-large"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
def classify_text(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class = torch.argmax(logits, dim=1).item()
return predicted_class
if __name__ == "__main__":
examples = [
{"text": "wild west cowboy caravan through Arizona Desert , Men Woman and children, horse drawn carriages", "expected_class": 0},
{"text": "A person with questionable intentions in a dimly lit alley.", "expected_class": 1},
{"text": "An inappropriate scene involving explicit content.", "expected_class": 2}
]
for example in examples:
text = example["text"]
expected_class = example["expected_class"]
predicted_class = classify_text(text)
print(f"Prompt: {text}")
print(f"Predicted class: {predicted_class}")
print(f"Expected class: {expected_class}")
print("---")
Disclaimer
The dataset includes actual prompts for image generation that could be perceived as abusive, offensive, or obscene. Furthermore, the examples and data might contain unfavorable information about certain businesses. We merely collect this data and do not assume any legal responsibility. Please note: A portion of the data was created using Large Language Models (LLM).
- Downloads last month
- 314
Evaluation results
- Accuracy on nsfw-text-detectionself-reported93%
- Precision on nsfw-text-detectionself-reported88%
- Recall on nsfw-text-detectionself-reported90%