Text Classification
PyTorch
English
deberta
Trained with AutoTrain
mitulr Satya10 commited on
Commit
8c2a438
0 Parent(s):

Duplicate from KoalaAI/Text-Moderation

Browse files

Co-authored-by: Satya <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.bin.* filter=lfs diff=lfs merge=lfs -text
37
+ *.tar.gz filter=lfs diff=lfs merge=lfs -text
38
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - autotrain
4
+ - text-classification
5
+ language:
6
+ - en
7
+ widget:
8
+ - text: I love AutoTrain
9
+ - text: I absolutely hate those people
10
+ - text: I love cake!
11
+ - text: >-
12
+ lets build the wall and deport illegals "they walk across the border like
13
+ this is Central park"
14
+ - text: EU offers to pay countries 6,000 euros per person to take in migrants
15
+ datasets:
16
+ - mmathys/openai-moderation-api-evaluation
17
+ - KoalaAI/Text-Moderation-v2-small
18
+ co2_eq_emissions:
19
+ emissions: 0.03967468113268738
20
+ license: openrail
21
+ ---
22
+
23
+ # Text Moderation
24
+ This model is a text classification model based on Deberta-v3 that predicts whether a text contains text that could be considered offensive.
25
+ It is split up in the following labels:
26
+
27
+ | Category | Label | Definition |
28
+ | -------- | ----- | ---------- |
29
+ | sexual | `S` | Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness). |
30
+ | hate | `H` | Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. |
31
+ | violence | `V` | Content that promotes or glorifies violence or celebrates the suffering or humiliation of others. |
32
+ | harassment | `HR` | Content that may be used to torment or annoy individuals in real life, or make harassment more likely to occur. |
33
+ | self-harm | `SH` | Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders. |
34
+ | sexual/minors | `S3` | Sexual content that includes an individual who is under 18 years old. |
35
+ | hate/threatening | `H2` | Hateful content that also includes violence or serious harm towards the targeted group. |
36
+ | violence/graphic | `V2` | Violent content that depicts death, violence, or serious physical injury in extreme graphic detail. |
37
+ | OK | `OK` | Not offensive
38
+
39
+ It's important to remember that this model was only trained on English texts, and may not perform well on non-English inputs.
40
+
41
+ ## Ethical Considerations
42
+ This is a model that deals with sensitive and potentially harmful language. Users should consider the ethical implications and potential risks of using or deploying this model in their applications or contexts. Some of the ethical issues that may arise are:
43
+
44
+ - The model may reinforce or amplify existing biases or stereotypes in the data or in the society. For example, the model may associate certain words or topics with offensive language based on the frequency or co-occurrence in the data, without considering the meaning or intent behind them. This may result in unfair or inaccurate predictions for some groups or individuals.
45
+
46
+ Users should carefully consider the purpose, context, and impact of using this model, and take appropriate measures to prevent or mitigate any potential harm. Users should also respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions.
47
+
48
+ ## License
49
+
50
+ This model is licensed under the CodeML OpenRAIL-M 0.1 license, which is a variant of the BigCode OpenRAIL-M license. This license allows you to freely access, use, modify, and distribute this model and its derivatives, for research, commercial or non-commercial purposes, as long as you comply with the following conditions:
51
+
52
+ - You must include a copy of the license and the original source of the model in any copies or derivatives of the model that you distribute.
53
+ - You must not use the model or its derivatives for any unlawful, harmful, abusive, discriminatory, or offensive purposes, or to cause or contribute to any social or environmental harm.
54
+ - You must respect the privacy and consent of the data subjects whose data was used to train or evaluate the model, and adhere to the relevant laws and regulations in your jurisdiction.
55
+ - You must acknowledge that the model and its derivatives are provided "as is", without any warranties or guarantees of any kind, and that the licensor is not liable for any damages or losses arising from your use of the model or its derivatives.
56
+
57
+ By accessing or using this model, you agree to be bound by the terms of this license. If you do not agree with the terms of this license, you must not access or use this model.
58
+
59
+ ## Training Details
60
+ - Problem type: Multi-class Classification
61
+ - CO2 Emissions (in grams): 0.0397
62
+
63
+ ## Validation Metrics
64
+
65
+ - Loss: 0.848
66
+ - Accuracy: 0.749 (75%)
67
+ - Macro F1: 0.326
68
+ - Micro F1: 0.749
69
+ - Weighted F1: 0.703
70
+ - Macro Precision: 0.321
71
+ - Micro Precision: 0.749
72
+ - Weighted Precision: 0.671
73
+ - Macro Recall: 0.349
74
+ - Micro Recall: 0.749
75
+ - Weighted Recall: 0.749
76
+
77
+
78
+ ## Usage
79
+
80
+ You can use cURL to access this model:
81
+
82
+ ```
83
+ $ curl -X POST -H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "I love AutoTrain"}' https://api-inference.huggingface.co/models/KoalaAI/Text-Moderation
84
+ ```
85
+
86
+ Or Python API:
87
+
88
+ ```python
89
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
90
+
91
+ # Load the model and tokenizer
92
+ model = AutoModelForSequenceClassification.from_pretrained("KoalaAI/Text-Moderation")
93
+ tokenizer = AutoTokenizer.from_pretrained("KoalaAI/Text-Moderation")
94
+
95
+ # Run the model on your input
96
+ inputs = tokenizer("I love AutoTrain", return_tensors="pt")
97
+ outputs = model(**inputs)
98
+
99
+ # Get the predicted logits
100
+ logits = outputs.logits
101
+
102
+ # Apply softmax to get probabilities (scores)
103
+ probabilities = logits.softmax(dim=-1).squeeze()
104
+
105
+ # Retrieve the labels
106
+ id2label = model.config.id2label
107
+ labels = [id2label[idx] for idx in range(len(probabilities))]
108
+
109
+ # Combine labels and probabilities, then sort
110
+ label_prob_pairs = list(zip(labels, probabilities))
111
+ label_prob_pairs.sort(key=lambda item: item[1], reverse=True)
112
+
113
+ # Print the sorted results
114
+ for label, probability in label_prob_pairs:
115
+ print(f"Label: {label} - Probability: {probability:.4f}")
116
+ ```
117
+
118
+ The output of the above Python code will look like this:
119
+ ```
120
+ Label: OK - Probability: 0.9840
121
+ Label: H - Probability: 0.0043
122
+ Label: SH - Probability: 0.0039
123
+ Label: V - Probability: 0.0019
124
+ Label: S - Probability: 0.0018
125
+ Label: HR - Probability: 0.0015
126
+ Label: V2 - Probability: 0.0011
127
+ Label: S3 - Probability: 0.0010
128
+ Label: H2 - Probability: 0.0006
129
+ ```
config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "AutoTrain",
3
+ "_num_labels": 9,
4
+ "architectures": [
5
+ "DebertaForSequenceClassification"
6
+ ],
7
+ "attention_probs_dropout_prob": 0.1,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "H",
13
+ "1": "H2",
14
+ "2": "HR",
15
+ "3": "OK",
16
+ "4": "S",
17
+ "5": "S3",
18
+ "6": "SH",
19
+ "7": "V",
20
+ "8": "V2"
21
+ },
22
+ "initializer_range": 0.02,
23
+ "intermediate_size": 3072,
24
+ "label2id": {
25
+ "H": 0,
26
+ "H2": 1,
27
+ "HR": 2,
28
+ "OK": 3,
29
+ "S": 4,
30
+ "S3": 5,
31
+ "SH": 6,
32
+ "V": 7,
33
+ "V2": 8
34
+ },
35
+ "layer_norm_eps": 1e-07,
36
+ "max_length": 384,
37
+ "max_position_embeddings": 512,
38
+ "max_relative_positions": -1,
39
+ "model_type": "deberta",
40
+ "num_attention_heads": 12,
41
+ "num_hidden_layers": 12,
42
+ "pad_token_id": 0,
43
+ "padding": "max_length",
44
+ "pooler_dropout": 0,
45
+ "pooler_hidden_act": "gelu",
46
+ "pooler_hidden_size": 768,
47
+ "pos_att_type": [
48
+ "c2p",
49
+ "p2c"
50
+ ],
51
+ "position_biased_input": false,
52
+ "relative_attention": true,
53
+ "torch_dtype": "float32",
54
+ "transformers_version": "4.29.2",
55
+ "type_vocab_size": 0,
56
+ "vocab_size": 50265
57
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea5f0e4c73533341324519717de164dd3f788f42ff52b3658121dd00ef723c2c
3
+ size 556825292
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d990e67de0fde200e70d95bfe8a65d676a4c3644e2f9483a32658358842a156
3
+ size 556870129
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": true,
26
+ "normalized": true,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc39e4a9ef588149e8513a2d7e0d1cda450b1884a5f6c62a945d9ec1e5bdbfe4
3
+ size 2109876
tokenizer_config.json ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "[CLS]",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "clean_up_tokenization_spaces": true,
13
+ "cls_token": {
14
+ "__type": "AddedToken",
15
+ "content": "[CLS]",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "do_lower_case": false,
22
+ "eos_token": {
23
+ "__type": "AddedToken",
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": true,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "errors": "replace",
31
+ "mask_token": {
32
+ "__type": "AddedToken",
33
+ "content": "[MASK]",
34
+ "lstrip": true,
35
+ "normalized": true,
36
+ "rstrip": false,
37
+ "single_word": false
38
+ },
39
+ "model_max_length": 512,
40
+ "pad_token": {
41
+ "__type": "AddedToken",
42
+ "content": "[PAD]",
43
+ "lstrip": false,
44
+ "normalized": true,
45
+ "rstrip": false,
46
+ "single_word": false
47
+ },
48
+ "sep_token": {
49
+ "__type": "AddedToken",
50
+ "content": "[SEP]",
51
+ "lstrip": false,
52
+ "normalized": true,
53
+ "rstrip": false,
54
+ "single_word": false
55
+ },
56
+ "tokenizer_class": "DebertaTokenizer",
57
+ "unk_token": {
58
+ "__type": "AddedToken",
59
+ "content": "[UNK]",
60
+ "lstrip": false,
61
+ "normalized": true,
62
+ "rstrip": false,
63
+ "single_word": false
64
+ },
65
+ "vocab_type": "gpt2"
66
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff