---
license: mit
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
- distilabel
- argilla
base_model: microsoft/phi-2
model-index:
- name: phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
  results: []
datasets:
- argilla/distilabel-intel-orca-dpo-pairs
language:
- en
pipeline_tag: text-generation
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# phi2-lora-quantized-distilabel-intel-orca-dpo-pairs

This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co./microsoft/phi-2) on [distilabel-intel-orca-dpo-pairs](https://huggingface.co./datasets/argilla/distilabel-intel-orca-dpo-pairs).
The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing).

It achieves the following results on the evaluation set:
- Loss: 0.4537
- Rewards/chosen: -0.0837
- Rewards/rejected: -1.2628
- Rewards/accuracies: 0.8301
- Rewards/margins: 1.1791
- Logps/rejected: -224.8409
- Logps/chosen: -203.2228
- Logits/rejected: 0.4773
- Logits/chosen: 0.3062

## Model description

The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co./datasets/argilla/distilabel-intel-orca-dpo-pairs). In order to scale LoRa approached for LLMs, I recommend looking at [predibase/lorax](https://github.com/predibase/lorax).

You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available).

```python
import torch
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig
)
from peft import PeftModel

# template used for fine-tune
# template = """\
# Instruct: {instruction}\n
# Output: {response}"""

if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using {torch.cuda.get_device_name(0)}")
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype='float16',
        bnb_4bit_use_double_quant=False,
    )
elif torch.backends.mps.is_available():
    device = torch.device("mps")
    bnb_config = None
else:
    device = torch.device("cpu")
    bnb_config = None
    print("No GPU available, using CPU instead.")

config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config)
model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device)

prompt = "Instruct: What is the capital of France? \nOutput:""
inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs)
text = tokenizer.batch_decode(outputs)[0]
```

## Intended uses & limitations

This is a LoRa adapter fine-tine for phi-2 and not a full fine-tune of the model. Additionally, I did not spend time updating parameters.

## Training and evaluation data

The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co./datasets/argilla/distilabel-intel-orca-dpo-pairs). The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing). Underneath, there are some configs for the adapter and the trainer.

```python
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.5,
    r=32,
    target_modules=['k_proj', 'q_proj', 'v_proj', 'fc1', 'fc2'],
    bias="none",
    task_type="CAUSAL_LM",
)
```

```python
training_arguments = TrainingArguments(
    output_dir=f"./{model_name}",
    evaluation_strategy="steps",
    do_eval=True,
    optim="paged_adamw_8bit",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=16,
    per_device_eval_batch_size=2,
    log_level="debug",
    save_steps=20,
    logging_steps=20,
    learning_rate=1e-5,
    eval_steps=20,
    num_train_epochs=1, # Modified for tutorial purposes
    max_steps=100,
    warmup_steps=20,
    lr_scheduler_type="linear",
)
```

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 20
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6853        | 0.06  | 20   | 0.6701          | 0.0133         | -0.0368          | 0.6905             | 0.0501          | -212.5803      | -202.2522    | 0.3853          | 0.2532        |
| 0.6312        | 0.12  | 40   | 0.5884          | 0.0422         | -0.2208          | 0.8138             | 0.2630          | -214.4207      | -201.9638    | 0.4254          | 0.2816        |
| 0.547         | 0.19  | 60   | 0.5146          | 0.0172         | -0.5786          | 0.8278             | 0.5958          | -217.9983      | -202.2132    | 0.4699          | 0.3110        |
| 0.4388        | 0.25  | 80   | 0.4893          | -0.0808        | -1.0789          | 0.8293             | 0.9981          | -223.0014      | -203.1934    | 0.5158          | 0.3396        |
| 0.4871        | 0.31  | 100  | 0.4818          | -0.1298        | -1.2346          | 0.8297             | 1.1048          | -224.5586      | -203.6837    | 0.5133          | 0.3340        |
| 0.4863        | 0.37  | 120  | 0.4723          | -0.1230        | -1.1718          | 0.8301             | 1.0488          | -223.9305      | -203.6159    | 0.4910          | 0.3167        |
| 0.4578        | 0.44  | 140  | 0.4666          | -0.1257        | -1.1772          | 0.8301             | 1.0515          | -223.9844      | -203.6428    | 0.4795          | 0.3078        |
| 0.4587        | 0.5   | 160  | 0.4625          | -0.0746        | -1.1272          | 0.8301             | 1.0526          | -223.4841      | -203.1310    | 0.4857          | 0.3139        |
| 0.4688        | 0.56  | 180  | 0.4595          | -0.0584        | -1.1194          | 0.8297             | 1.0610          | -223.4062      | -202.9692    | 0.4890          | 0.3171        |
| 0.4189        | 0.62  | 200  | 0.4579          | -0.0666        | -1.1647          | 0.8297             | 1.0982          | -223.8598      | -203.0511    | 0.4858          | 0.3138        |
| 0.4392        | 0.68  | 220  | 0.4564          | -0.0697        | -1.1915          | 0.8301             | 1.1219          | -224.1278      | -203.0823    | 0.4824          | 0.3110        |
| 0.4659        | 0.75  | 240  | 0.4554          | -0.0826        | -1.2245          | 0.8301             | 1.1419          | -224.4574      | -203.2112    | 0.4761          | 0.3052        |
| 0.4075        | 0.81  | 260  | 0.4544          | -0.0823        | -1.2328          | 0.8301             | 1.1504          | -224.5403      | -203.2089    | 0.4749          | 0.3044        |
| 0.4015        | 0.87  | 280  | 0.4543          | -0.0833        | -1.2590          | 0.8301             | 1.1757          | -224.8026      | -203.2188    | 0.4779          | 0.3067        |
| 0.4365        | 0.93  | 300  | 0.4539          | -0.0846        | -1.2658          | 0.8301             | 1.1812          | -224.8702      | -203.2313    | 0.4780          | 0.3067        |
| 0.4589        | 1.0   | 320  | 0.4537          | -0.0837        | -1.2628          | 0.8301             | 1.1791          | -224.8409      | -203.2228    | 0.4773          | 0.3062        |


### Framework versions

- PEFT 0.7.1
- Transformers 4.37.1
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.1