chameleon-lizard
/

detox-mt0-xl

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

detox-mt0-xl / README.md

chameleon-lizard's picture

chameleon-lizard

Update README.md

cf15d89 verified 4 months ago

|

history blame contribute delete

No virus

1.86 kB

	---
	library_name: transformers
	license: openrail++
	datasets:
	- textdetox/multilingual_paradetox
	- chameleon-lizard/synthetic-multilingual-paradetox
	language:
	- ru
	- en
	- am
	- uk
	- de
	- es
	- ar
	- hi
	- zh
	pipeline_tag: text2text-generation
	---

	# Model Card for Model ID

	Finetune of the mt0-xl model for text detoxification task.


	## Model Details

	### Model Description

	This is a finetune of mt0-xl model for text detoxification task. Can be used for synthetic data generation from toxic examples.

	- Developed by: Nikita Sushko
	- Model type: mt5-xl
	- Language(s) (NLP): English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi
	- License: OpenRail++
	- Finetuned from model: mt0-xl

	## Uses

	This model is intended to be used as a text detoxification task in 9 languages: English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi.

	### Direct Use

	The model may be directly used for text detoxification tasks.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	import transformers

	checkpoint = 'chameleon-lizard/detox-mt0-xl'

	tokenizer = transformers.AutoTokenizer.from_pretrained(checkpoint)
	model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype='auto', device_map="auto")

	pipe = transformers.pipeline(
	"text2text-generation",
	model=model,
	tokenizer=tokenizer,
	max_length=512,
	truncation=True,
	)

	language = 'English'
	text = "You are a major fucking disappointment."
	print(pipe('Write a non-toxic version of the following text in {language}: {text}')[0]['generated_text'])
	# Resulting text: "You are a major disappointment.""
	```

	Be sure to prompt with the provided prompt format for the best performance. Failure to include target language may result in model responses be in random language.