Spaces:
Runtime error
This is a basic guide that outlines what is necessary to train a NSFW detector on top of a CLIP model.
Note that this is a general guide and meant to be used as a guideline.
Dataset Prep
You will need to obtain thousands of NSFW and SFW images to train a good model.
Once the images are gathered, you can place them in a directory and leverage clip-retrieval to easily create a numpy array of embeddings. (For more details read the docs)
Finally, you will need to create target values which can be done easily in the python interpreter.
Target Values & Dataset Combination
# find out how many positive samples you have
pos_x = np.load(path/to_positive/samples.npy)
neg_x = np.load(path/to_negative/samples.npy)
num_pos = pos_x.shape[0]
num_neg = neg_x.shape[0]
# create target values
pos_y = np.ones((num_pos, 1))
neg_y = np.zeros((num_neg, 1))
# combine the x samples
# NOTE: we will rely on torch dataloader shuffling to break the ordering here
x = np.vstack((pos_x, neg_x))
y = np.vstack((pos_y, neg_y))
# save the dataset x & y
np.save("train_x.npy", x)
np.save("train_y.npy", y)
Model Training
Thankfully it is possible to use a very simple linear model to train the NSFW detector.
For the purposes of this guide we will reference this repo and its model architecture.
NOTE: It is also possible to utilize the training script provided in that repo as boilerplate code, provided you have
.npy
files for your dataset's x & y
Model Architecture
Feel free to tweak the model architecture here, but the important thing to remember is that your input dimension should match the dimension of your CLIP embeddings, and your output dimension should be 1.
import torch.nn as nn
class H14_NSFW_Detector(nn.Module):
def __init__(self, input_size=1024):
super().__init__()
self.input_size = input_size
self.layers = nn.Sequential(
nn.Linear(self.input_size, 1024),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(1024, 2048),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(2048, 1024),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(1024, 256),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(256, 128),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(128, 16),
nn.Linear(16, 1)
)
def forward(self, x):
return self.layers(x)
Training Snippets
Below is a list of snippets that should walk you through the general steps necessary to train a MLP in PyTorch, however, it is completely fine to replace the existing MLP in this-repo with the model provided above & begin training.
For those of you who wish to build out custom code, the following snippets should get the ball rolling...
Import the necessary libraries:
import torch
import torch.nn as nn
from torch.optim import Adam
from torch.utils.data import TensorDataset, DataLoader
Define the dataset and data loaders:
# Define the dataset
x = torch.from_numpy(np.load("train_x.npy"))
y = torch.from_numpy(np.load("train_y.npy"))
train_dataset = TensorDataset(x,y)
# Define the data loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
Initialize the model
model = H14_NSFW_Detector()
Define the loss function
criterion = nn.MSELoss()
Define the optimizer
# Define the optimizer
optimizer = Adam(model.parameters())
Define the training loop
# Define the number of epochs
num_epochs = 10
# Training loop
for epoch in range(num_epochs):
for inputs, labels in train_loader:
# Clear gradients
optimizer.zero_grad()
# Forward pass
outputs = model(inputs)
# Compute the loss
loss = criterion(outputs, labels)
# Backward pass and optimization
loss.backward()
optimizer.step()
Define an evaluation loop
# Evaluation loop
model.eval()
with torch.no_grad():
for inputs, labels in val_loader:
# Forward pass
outputs = model(inputs)
# Compute the loss
loss = criterion(outputs, labels)
Note that this is a basic guide and you may need to add additional functionality such as model saving and loading, early stopping, etc. You may want to adjust the learning rate, batch size, and number of epochs as well.