File size: 3,355 Bytes
464adb4 5a4b298 4250d8b 5a4b298 62d944c 5a4b298 62d944c 5a4b298 62d944c 5a4b298 62d944c 5a4b298 4250d8b 5a4b298 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
---
license: mit
language:
- en
- ja
- zh
- ko
metrics:
- accuracy
base_model: google-bert/bert-base-multilingual-cased
pipeline_tag: text-classification
tags:
- sex
- filename
- dectection
- content
- mbert
- Multilingual
---
# Model Card for Model ID
Detect sexual content in text or file names.
## Model Details
### Model Description
- **Developed by:** liu wei
- **License:** MIT
- **Finetuned from model:** bert-base-multilingual-cased
- **Task:** Simple Classification
- **Language:** Multilingual
- **Max Length:** 128
- **Updated Time:** 2024-8-22
### Model Training Information
- **Training Dataset Size:** 100,000 manually annotated data with noise
- **Data Distribution:** 50:50
- **Batch Size:** 8
- **Epochs:** 5
- **Accuracy:** 92%
- **F1:** 92%
<a href="https://ko-fi.com/ugetai" target="_blank" rel="noopener noreferrer">Buy me a cup of coffee,thanks</a>
## Uses
- Supports multiple languages, such as English, Chinese, Japanese, etc.
- Use for detect content submitted by users in forums, magnetic search engines, cloud disks, etc.
- Detect semantics and variant content, Porn movie numbers or variant file names.
- Compared with GPT4O-mini, The detection accuracy is greatly improved.
### Examples
- Example **English**
```python
predict("Tiffany Doll - Wine Makes Me Anal (31.03.2018)_1080p.mp4")
```
```json
{
"predictions": 1,
"label": "Sexual"
}
```
- Example **Chinese**
```python
predict("橙子 · 保安和女业主的一夜春宵。路见不平拔刀相助,救下苏姐,以身相许!")
```
```json
{
"predictions": 1,
"label": "Sexual"
}
```
- Example **Japanese**
```python
predict("MILK-217-UNCENSORED-LEAKピタコス Gカップ痴女 完全着衣で濃密5PLAY 椿りか 580 2.TS")
```
```json
{
"predictions": 1,
"label": "Sexual"
}
```
- Example **Porn Movie Numbers**
```python
predict("DVAJ-548_CH_SD")
```
```json
{
"predictions": 1,
"label": "Sexual"
}
```
## How to Get Started with the Model
### step 1:
Create a python file under this model, such as 'use_model.py'
```python
import torch
from transformers import BertForSequenceClassification, BertTokenizer
# load model
tokenizer = BertTokenizer.from_pretrained("uget/sexual_content_dection")
model = BertForSequenceClassification.from_pretrained("uget/sexual_content_dection")
def predict(text):
encoding = tokenizer(text, return_tensors="pt")
encoding = {k: v.to(model.device) for k,v in encoding.items()}
outputs = model(**encoding)
probs = torch.sigmoid(outputs.logits)
predictions = torch.argmax(probs, dim=-1)
label_map = {0: "None", 1: "Sexual"}
predicted_label = label_map[predictions.item()]
print(f"Predictions:{predictions.item()}, Label:{predicted_label}")
return {"predictions": predictions.item(), "label": predicted_label}
predict("Tiffany Doll - Wine Makes Me Anal (31.03.2018)_1080p.mp4")
```
### step 2:
Run
```shell
python3 use_model.py
```
Response JSON
```json
{
"predictions": 1,
"label": "Sexual"
}
```
### Explanation
The results only include two situations:
- predictions-0 **Not Dectection** sexual content;
- predictions-1 **Sexual** content was detected.
<a href="https://ko-fi.com/ugetai" target="_blank" rel="noopener noreferrer">Buy me a cup of coffee,thanks</a>
## Model Card Contact
Email: [email protected] |