mT0-XL (SynthDetoxM Full)

This a fine-tune of bigscience/mt0-xl model on a subset of the multilingual text detoxification dataset SynthDetoxM from the NAACL 2025 Main Track paper SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators by Daniil Moskovskiy et al.

Usage

The usage is similar to the

from transformers import pipeline

toxic_text = "Your toxic text goes here."

pipe = pipeline("text2text-generation", model="s-nlp/mt0-xl-detox-sdm-full")
pipe(f"Detoxify: {toxic_text}")

Training Details

The model was fine-tuned for 2 epochs on s-nlp/synthdetoxm dataset with full precision (FP32) using Adafactor optimizer with 1e-4 learning rate and batch size of 4 with gradient checkpointing enabled. The full training configuration is available below:

{
    "do_train": true,
    "do_eval": true,
    "per_device_train_batch_size": 4,
    "per_device_eval_batch_size": 4,
    "learning_rate": 1e-4,
    "weight_decay": 0,
    "num_train_epochs": 2,
    "gradient_accumulation_steps": 1,
    "logging_strategy": "steps",
    "logging_steps": 1,
    "save_strategy": "epoch",
    "save_total_limit": 1,
    "warmup_steps": 1,
    "report_to": "wandb",
    "optim": "adafactor",
    "lr_scheduler_type": "linear",
    "predict_with_generate": true,
    "bf16": false,
    "gradient_checkpointing": true,
    "output_dir": "/path/",
    "seed": 42,
}

Metrics

We use the multilingual detoxification evaluation setup from TextDetox 2024 Multilingual Text Detoxification Shared Task. Specifically, we use the following metrics:

Style Transfer Accuracy (STA) is calculated with a textdetox/xlmr-large-toxicity-classifier.
Text Similarity (SIM) is calculated as a similarity of text embeddings given by a sentence-transformers/LaBSE encoder.
Fluency (FL) is calculated as a character n-gram F score - ChrF1.

These metrics are aggregated in a final Joint metric (J):

$\textbf{J} = \frac{1}{n}\sum\limits_{i=1}^{n}\textbf{STA}(y_i) \cdot \textbf{SIM}(x_i,y_i) \cdot \textbf{FL}(x_i, y_i)$

Evaluation Results

This model was evaluated on the test set of textdetox/multilingual_paradetox dataset from TextDetox 2024 Multilingual Text Detoxification Shared Task. The results of the evaluation are presented below.

	German	Spanish	Russian
Human References	0.733	0.709	0.732
Baselines
Duplicate	0.287	0.090	0.048
Delete	0.362	0.319	0.255
Backtranslation	0.233	0.275	0.223
mT0-XL supervised fine-tuning
MultiParaDetox `s-nlp/mt0-xl-detox-mpd`	0.446	0.344	0.472
SynthDetoxM (Subset AVG this model)	0.460	0.402	0.475
SynthDetoxM `s-nlp/mt0-xl-detox-sdm-full`	0.482	0.470	0.546

Software

Code for replicating the results from the paper can be found on GitHub.

Citation

BibTeX:

@misc{moskovskiy2025synthdetoxmmodernllmsfewshot,
      title={SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators}, 
      author={Daniil Moskovskiy and Nikita Sushko and Sergey Pletenev and Elena Tutubalina and Alexander Panchenko},
      year={2025},
      eprint={2502.06394},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.06394}, 
}

License

This model is licensed under the OpenRAIL++ License, which supports the development of various technologies—both industrial and academic—that serve the public good.

Model Card Authors

Daniil Moskovskiy

Model Card Contact

For any questions, please contact: Daniil Moskovskiy

s-nlp
/

mt0-xl-detox-sdm-subset