Safetensors
bert
lordofthejars commited on
Commit
1188bd8
·
verified ·
1 Parent(s): 08e99ca

Commit model with tokenizer

Browse files
README.md ADDED
@@ -0,0 +1,330 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+
6
+ <div align="center">
7
+
8
+ **⚠️ Disclaimer:**
9
+ The huggingface models currently give different results to the detoxify library (see issue [here](https://github.com/unitaryai/detoxify/issues/15)). For the most up to date models we recommend using the models from https://github.com/unitaryai/detoxify
10
+
11
+ # 🙊 Detoxify
12
+ ## Toxic Comment Classification with ⚡ Pytorch Lightning and 🤗 Transformers
13
+
14
+ ![CI testing](https://github.com/unitaryai/detoxify/workflows/CI%20testing/badge.svg)
15
+ ![Lint](https://github.com/unitaryai/detoxify/workflows/Lint/badge.svg)
16
+
17
+ </div>
18
+
19
+ ![Examples image](examples.png)
20
+
21
+ ## Description
22
+
23
+ Trained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification.
24
+
25
+ Built by [Laura Hanu](https://laurahanu.github.io/) at [Unitary](https://www.unitary.ai/), where we are working to stop harmful content online by interpreting visual content in context.
26
+
27
+ Dependencies:
28
+ - For inference:
29
+ - 🤗 Transformers
30
+ - ⚡ Pytorch lightning
31
+ - For training will also need:
32
+ - Kaggle API (to download data)
33
+
34
+
35
+ | Challenge | Year | Goal | Original Data Source | Detoxify Model Name | Top Kaggle Leaderboard Score | Detoxify Score
36
+ |-|-|-|-|-|-|-|
37
+ | [Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) | 2018 | build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. | Wikipedia Comments | `original` | 0.98856 | 0.98636
38
+ | [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) | 2019 | build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. | Civil Comments | `unbiased` | 0.94734 | 0.93639
39
+ | [Jigsaw Multilingual Toxic Comment Classification](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification) | 2020 | build effective multilingual models | Wikipedia Comments + Civil Comments | `multilingual` | 0.9536 | 0.91655*
40
+
41
+ *Score not directly comparable since it is obtained on the validation set provided and not on the test set. To update when the test labels are made available.
42
+
43
+ It is also noteworthy to mention that the top leadearboard scores have been achieved using model ensembles. The purpose of this library was to build something user-friendly and straightforward to use.
44
+
45
+ ## Limitations and ethical considerations
46
+
47
+ If words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the tone or the intent of the author e.g. humorous/self-deprecating. This could present some biases towards already vulnerable minority groups.
48
+
49
+ The intended use of this library is for research purposes, fine-tuning on carefully constructed datasets that reflect real world demographics and/or to aid content moderators in flagging out harmful content quicker.
50
+
51
+ Some useful resources about the risk of different biases in toxicity or hate speech detection are:
52
+ - [The Risk of Racial Bias in Hate Speech Detection](https://homes.cs.washington.edu/~msap/pdfs/sap2019risk.pdf)
53
+ - [Automated Hate Speech Detection and the Problem of Offensive Language](https://arxiv.org/pdf/1703.04009.pdf%201.pdf)
54
+ - [Racial Bias in Hate Speech and Abusive Language Detection Datasets](https://arxiv.org/pdf/1905.12516.pdf)
55
+
56
+ ## Quick prediction
57
+
58
+
59
+ The `multilingual` model has been trained on 7 different languages so it should only be tested on: `english`, `french`, `spanish`, `italian`, `portuguese`, `turkish` or `russian`.
60
+
61
+ ```bash
62
+ # install detoxify
63
+
64
+ pip install detoxify
65
+
66
+ ```
67
+ ```python
68
+
69
+ from detoxify import Detoxify
70
+
71
+ # each model takes in either a string or a list of strings
72
+
73
+ results = Detoxify('original').predict('example text')
74
+
75
+ results = Detoxify('unbiased').predict(['example text 1','example text 2'])
76
+
77
+ results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])
78
+
79
+ # optional to display results nicely (will need to pip install pandas)
80
+
81
+ import pandas as pd
82
+
83
+ print(pd.DataFrame(results, index=input_text).round(5))
84
+
85
+ ```
86
+ For more details check the Prediction section.
87
+
88
+
89
+ ## Labels
90
+ All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according the following schema:
91
+ - **Very Toxic** (a very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective)
92
+ - **Toxic** (a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective)
93
+ - **Hard to Say**
94
+ - **Not Toxic**
95
+
96
+ More information about the labelling schema can be found [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data).
97
+
98
+ ### Toxic Comment Classification Challenge
99
+ This challenge includes the following labels:
100
+
101
+ - `toxic`
102
+ - `severe_toxic`
103
+ - `obscene`
104
+ - `threat`
105
+ - `insult`
106
+ - `identity_hate`
107
+
108
+ ### Jigsaw Unintended Bias in Toxicity Classification
109
+ This challenge has 2 types of labels: the main toxicity labels and some additional identity labels that represent the identities mentioned in the comments.
110
+
111
+ Only identities with more than 500 examples in the test set (combined public and private) are included during training as additional labels and in the evaluation calculation.
112
+
113
+ - `toxicity`
114
+ - `severe_toxicity`
115
+ - `obscene`
116
+ - `threat`
117
+ - `insult`
118
+ - `identity_attack`
119
+ - `sexual_explicit`
120
+
121
+ Identity labels used:
122
+ - `male`
123
+ - `female`
124
+ - `homosexual_gay_or_lesbian`
125
+ - `christian`
126
+ - `jewish`
127
+ - `muslim`
128
+ - `black`
129
+ - `white`
130
+ - `psychiatric_or_mental_illness`
131
+
132
+ A complete list of all the identity labels available can be found [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data).
133
+
134
+
135
+ ### Jigsaw Multilingual Toxic Comment Classification
136
+
137
+ Since this challenge combines the data from the previous 2 challenges, it includes all labels from above, however the final evaluation is only on:
138
+
139
+ - `toxicity`
140
+
141
+ ## How to run
142
+
143
+ First, install dependencies
144
+ ```bash
145
+ # clone project
146
+
147
+ git clone https://github.com/unitaryai/detoxify
148
+
149
+ # create virtual env
150
+
151
+ python3 -m venv toxic-env
152
+ source toxic-env/bin/activate
153
+
154
+ # install project
155
+
156
+ pip install -e detoxify
157
+ cd detoxify
158
+
159
+ # for training
160
+ pip install -r requirements.txt
161
+
162
+ ```
163
+
164
+ ## Prediction
165
+
166
+ Trained models summary:
167
+
168
+ |Model name| Transformer type| Data from
169
+ |:--:|:--:|:--:|
170
+ |`original`| `bert-base-uncased` | Toxic Comment Classification Challenge
171
+ |`unbiased`| `roberta-base`| Unintended Bias in Toxicity Classification
172
+ |`multilingual`| `xlm-roberta-base`| Multilingual Toxic Comment Classification
173
+
174
+ For a quick prediction can run the example script on a comment directly or from a txt containing a list of comments.
175
+ ```bash
176
+
177
+ # load model via torch.hub
178
+
179
+ python run_prediction.py --input 'example' --model_name original
180
+
181
+ # load model from from checkpoint path
182
+
183
+ python run_prediction.py --input 'example' --from_ckpt_path model_path
184
+
185
+ # save results to a .csv file
186
+
187
+ python run_prediction.py --input test_set.txt --model_name original --save_to results.csv
188
+
189
+ # to see usage
190
+
191
+ python run_prediction.py --help
192
+
193
+ ```
194
+
195
+ Checkpoints can be downloaded from the latest release or via the Pytorch hub API with the following names:
196
+ - `toxic_bert`
197
+ - `unbiased_toxic_roberta`
198
+ - `multilingual_toxic_xlm_r`
199
+ ```bash
200
+ model = torch.hub.load('unitaryai/detoxify','toxic_bert')
201
+ ```
202
+
203
+ Importing detoxify in python:
204
+
205
+ ```python
206
+
207
+ from detoxify import Detoxify
208
+
209
+ results = Detoxify('original').predict('some text')
210
+
211
+ results = Detoxify('unbiased').predict(['example text 1','example text 2'])
212
+
213
+ results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])
214
+
215
+ # to display results nicely
216
+
217
+ import pandas as pd
218
+
219
+ print(pd.DataFrame(results,index=input_text).round(5))
220
+
221
+ ```
222
+
223
+
224
+ ## Training
225
+
226
+ If you do not already have a Kaggle account:
227
+ - you need to create one to be able to download the data
228
+
229
+ - go to My Account and click on Create New API Token - this will download a kaggle.json file
230
+
231
+ - make sure this file is located in ~/.kaggle
232
+
233
+ ```bash
234
+
235
+ # create data directory
236
+
237
+ mkdir jigsaw_data
238
+ cd jigsaw_data
239
+
240
+ # download data
241
+
242
+ kaggle competitions download -c jigsaw-toxic-comment-classification-challenge
243
+
244
+ kaggle competitions download -c jigsaw-unintended-bias-in-toxicity-classification
245
+
246
+ kaggle competitions download -c jigsaw-multilingual-toxic-comment-classification
247
+
248
+ ```
249
+ ## Start Training
250
+ ### Toxic Comment Classification Challenge
251
+
252
+ ```bash
253
+
254
+ python create_val_set.py
255
+
256
+ python train.py --config configs/Toxic_comment_classification_BERT.json
257
+ ```
258
+ ### Unintended Bias in Toxicicity Challenge
259
+
260
+ ```bash
261
+
262
+ python train.py --config configs/Unintended_bias_toxic_comment_classification_RoBERTa.json
263
+
264
+ ```
265
+ ### Multilingual Toxic Comment Classification
266
+
267
+ This is trained in 2 stages. First, train on all available data, and second, train only on the translated versions of the first challenge.
268
+
269
+ The [translated data](https://www.kaggle.com/miklgr500/jigsaw-train-multilingual-coments-google-api) can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set).
270
+
271
+ ```bash
272
+
273
+ # stage 1
274
+
275
+ python train.py --config configs/Multilingual_toxic_comment_classification_XLMR.json
276
+
277
+ # stage 2
278
+
279
+ python train.py --config configs/Multilingual_toxic_comment_classification_XLMR_stage2.json
280
+
281
+ ```
282
+ ### Monitor progress with tensorboard
283
+
284
+ ```bash
285
+
286
+ tensorboard --logdir=./saved
287
+
288
+ ```
289
+ ## Model Evaluation
290
+
291
+ ### Toxic Comment Classification Challenge
292
+
293
+ This challenge is evaluated on the mean AUC score of all the labels.
294
+
295
+ ```bash
296
+
297
+ python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv
298
+
299
+ ```
300
+ ### Unintended Bias in Toxicicity Challenge
301
+
302
+ This challenge is evaluated on a novel bias metric that combines different AUC scores to balance overall performance. More information on this metric [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/overview/evaluation).
303
+
304
+ ```bash
305
+
306
+ python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv
307
+
308
+ # to get the final bias metric
309
+ python model_eval/compute_bias_metric.py
310
+
311
+ ```
312
+ ### Multilingual Toxic Comment Classification
313
+
314
+ This challenge is evaluated on the AUC score of the main toxic label.
315
+
316
+ ```bash
317
+
318
+ python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv
319
+
320
+ ```
321
+
322
+ ### Citation
323
+ ```
324
+ @misc{Detoxify,
325
+ title={Detoxify},
326
+ author={Hanu, Laura and {Unitary team}},
327
+ howpublished={Github. https://github.com/unitaryai/detoxify},
328
+ year={2020}
329
+ }
330
+ ```
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "gradient_checkpointing": false,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "id2label": {
11
+ "0": "toxic",
12
+ "1": "severe_toxic",
13
+ "2": "obscene",
14
+ "3": "threat",
15
+ "4": "insult",
16
+ "5": "identity_hate"
17
+ },
18
+ "initializer_range": 0.02,
19
+ "intermediate_size": 3072,
20
+ "label2id": {
21
+ "toxic": 0,
22
+ "severe_toxic": 1,
23
+ "obscene": 2,
24
+ "threat": 3,
25
+ "insult": 4,
26
+ "identity_hate": 5
27
+ },
28
+ "problem_type": "multi_label_classification",
29
+ "layer_norm_eps": 1e-12,
30
+ "max_position_embeddings": 512,
31
+ "model_type": "bert",
32
+ "num_attention_heads": 12,
33
+ "num_hidden_layers": 12,
34
+ "pad_token_id": 0,
35
+ "type_vocab_size": 2,
36
+ "vocab_size": 30522
37
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c272885d24138df70bff1b3cd944a999bd6b41dad33209730aa8ba074f6ad09
3
+ size 437975136
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "model_max_length": 512, "name_or_path": "bert-base-uncased"}
vocab.txt ADDED
The diff for this file is too large to render. See raw diff