---
base_model: slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1
datasets:
- slm-research-vn/dpo-format-function-calling-v2
- slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4
- argilla/dpo-mix-7k
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: Qwen2-7B-Instruct-SPPO-Function-call-v2.4
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Qwen2-7B-Instruct-SPPO-Function-call-v2.4

This model is a fine-tuned version of [slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1](https://huggingface.co./slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1) on the slm-research-vn/dpo-format-function-calling-v2, the slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4 and the argilla/dpo-mix-7k datasets.
It achieves the following results on the evaluation set:
- Loss: 0.4345
- Rewards/chosen: 1.3033
- Rewards/rejected: 0.2776
- Rewards/accuracies: 0.8185
- Rewards/margins: 1.0258
- Logps/rejected: -333.5228
- Logps/chosen: -261.0424
- Logits/rejected: -0.7224
- Logits/chosen: -0.7089

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6782        | 0.1270 | 100  | 0.6611          | 0.1038         | 0.0272           | 0.8000             | 0.0766          | -338.5302      | -285.0340    | -0.7425         | -0.7284       |
| 0.5811        | 0.2540 | 200  | 0.5409          | 0.5575         | 0.1395           | 0.8370             | 0.4180          | -336.2845      | -275.9589    | -0.7306         | -0.6945       |
| 0.5484        | 0.3811 | 300  | 0.4777          | 0.9393         | 0.2286           | 0.8000             | 0.7107          | -334.5019      | -268.3231    | -0.7283         | -0.7031       |
| 0.4531        | 0.5081 | 400  | 0.4535          | 1.1283         | 0.2592           | 0.8296             | 0.8690          | -333.8891      | -264.5439    | -0.7170         | -0.6879       |
| 0.4577        | 0.6351 | 500  | 0.4415          | 1.2504         | 0.2849           | 0.8148             | 0.9655          | -333.3753      | -262.1006    | -0.7146         | -0.6865       |
| 0.4715        | 0.7621 | 600  | 0.4364          | 1.2963         | 0.2864           | 0.8148             | 1.0099          | -333.3469      | -261.1842    | -0.7175         | -0.6913       |
| 0.4508        | 0.8892 | 700  | 0.4348          | 1.2990         | 0.2819           | 0.8222             | 1.0172          | -333.4369      | -261.1283    | -0.7185         | -0.6937       |


### Framework versions

- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1