|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- textdetox/multilingual_paradetox |
|
- chameleon-lizard/synthetic-multilingual-paradetox |
|
language: |
|
- ru |
|
- en |
|
- am |
|
- uk |
|
- de |
|
- es |
|
- ar |
|
- hi |
|
pipeline_tag: text2text-generation |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
Finetune of the mt0-xl model for text detoxification task. |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This is a finetune of mt0-xl model for text detoxification task. Can be used for synthetic data generation from toxic examples. |
|
|
|
- **Developed by:** Nikita Sushko |
|
- **Model type:** mt5-xl |
|
- **Language(s) (NLP):** English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi |
|
- **License:** MIT |
|
- **Finetuned from model:** mt0-xl |
|
|
|
## Uses |
|
|
|
This model is intended to be used as a text detoxification task in 9 languages: English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi. |
|
|
|
### Direct Use |
|
|
|
The model may be directly used for text detoxification tasks. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
import transformers |
|
|
|
checkpoint = 'chameleon-lizard/detox-mt0-xl' |
|
|
|
tokenizer = transformers.AutoTokenizer.from_pretrained(checkpoint) |
|
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype='auto', device_map="auto") |
|
|
|
pipe = transformers.pipeline( |
|
"text2text-generation", |
|
model=model, |
|
tokenizer=tokenizer, |
|
max_length=512, |
|
truncation=True, |
|
) |
|
|
|
language = 'English' |
|
text = "You are a major fucking disappointment." |
|
print(pipe('Write a non-toxic version of the following text in {language}: {text}')[0]['generated_text']) |
|
# Resulting text: "You are a major disappointment."" |
|
``` |
|
|
|
Be sure to prompt with the provided prompt format for the best performance. Failure to include target language may result in model responses be in random language. |