metadata

language:
  - de
license: apache-2.0
tags:
  - dpo
  - alignment-handbook
  - gptq
  - quantization

GPTQ-Version of Phoenix

Bits	GS	GPTQ Dataset	Seq Len
4	128	c4	4096

Model Card for Phoenix

Phoenix is a model trained using Direct Preference Optimization (DPO) for the german language. Its training procedure follows the process of the alignment-handbook from Huggingface. In contrast to zephyr and notus this model has been trained using german instruction and dpo data. In detail, a german translation of HuggingFaceH4/ultrachat_200k and HuggingFaceH4/ultrafeedback_binarized were created in addition to a series of allready available instruction datasets. The LLM haoranxu/ALMA-13B was used for this. While the mistral model performs really well, it is not really suitable for the german language. Therefore we have used the fantastic LeoLM/leo-mistral-hessianai-7b. Thanks to the new type of training, Phoenix is not only able to compete with the Mistral model from LeoLM but also beats the Llama-70b-chat model in 2 mt-bench categories. This model wouldn't have been possible without the amazing work of Huggingface, LeoLM, openbnb, argilla, the Alma-Team and many others of the AI community. i would like to personally thank all AI researchers who make the training of such models possible

MT-Bench-DE Scores

Phoenix beats the LeoLM-Mistral model in all categories except for coding and humanities. Additionally it also Beats LeoLM/Llama-2-70b-chat in roleplay and reasoning which shows the power of DPO.

{
    "first_turn": 6.39375,
    "second_turn": 5.1625,
    "categories": {
        "writing": 7.45,
        "roleplay": 7.9,
        "reasoning": 4.3,
        "math": 3.25,
        "coding": 2.5,
        "extraction": 5.9,
        "stem": 7.125,
        "humanities": 7.8
    },
    "average": 5.778124999999999
}

Other Evaluations

Florian Leurer compared Phoenix to other LLMs. Check it out here:

'Evaluation of German LLMs'

Model Details

Model Description

Developed by: Matthias Uhlig (based on HuggingFace H4, Argillla and MistralAI previous efforts and amazing work)
Shared by: Matthias Uhlig
Model type: GPT-like 7B model DPO fine-tuned
Language(s) (NLP): German
License: Apache 2.0 (same as alignment-handbook/zephyr-7b-dpo-full)
Finetuned from model: LeoLM/leo-mistral-hessianai-7b

Model Sources

Repository: -
Paper: PHOENIX: Open-Source Language Adaption for Direct Preference Optimization
Demo: -

Training Details

Training Hardware

We used a VM with 8 x A100 80GB hosted in Runpods.io.

Training Data

We used a new translated version of HuggingFaceH4/ultrachat_200k, and argilla/ultrafeedback-binarized-preferences.

The data used for training will be made public after additional quality inspection.

Prompt template

We use the same prompt template as HuggingFaceH4/zephyr-7b-beta:

<|system|>
</s>
<|user|>
{prompt}</s>
<|assistant|>

It is also possible to use the model in a multi-turn setup

<|system|>
</s>
<|user|>
{prompt_1}</s>
<|assistant|>
{answer_1}</s>
<|user|>
{prompt_2}</s>
<|assistant|>

Usage

You will first need to install transformers and accelerate (just to ease the device placement), then you can run any of the following:

Via `generate`

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("DRXD1000/Phoenix-GPTQ", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("DRXD1000/Phoenix-GPTQ")
prompt = [
    {
        "role": "system",
        "content": "", #Not recommended. Phoenix does not react well on system prompts
    },
    {"role": "user", "content": "Erkläre mir was KI ist"},
]
inputs = tokenizer.apply_chat_template(prompt,  return_tensors="pt").to("cuda")
outputs = model.generate(inputs, num_return_sequences=1, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Ethical Considerations and Limitations

As with all LLMs, the potential outputs of DRXD1000/Phoenix cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of DRXD1000/Phoenix, developers should perform safety testing and tuning tailored to their specific applications of the model. Please see Meta's Responsible Use Guide.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

SFT Training

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 512
total_eval_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 1

DPO Training

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Citation

@misc{uhlig2024phoenix,
      title={PHOENIX: Open-Source Language Adaption for Direct Preference Optimization}, 
      author={Matthias Uhlig and Sigurd Schacht and Sudarshan Kamath Barkur},
      year={2024},
      eprint={2401.10580},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Framework versions

Transformers 4.35.0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.14.1