metadata

base_model: slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1
datasets:
  - slm-research-vn/dpo-format-function-calling-v2
  - >-
    slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4
  - argilla/dpo-mix-7k
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Qwen2-7B-Instruct-SPPO-Function-call-v2.4
    results: []

Qwen2-7B-Instruct-SPPO-Function-call-v2.4

This model is a fine-tuned version of slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1 on the slm-research-vn/dpo-format-function-calling-v2, the slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4 and the argilla/dpo-mix-7k datasets. It achieves the following results on the evaluation set:

Loss: 0.4345
Rewards/chosen: 1.3033
Rewards/rejected: 0.2776
Rewards/accuracies: 0.8185
Rewards/margins: 1.0258
Logps/rejected: -333.5228
Logps/chosen: -261.0424
Logits/rejected: -0.7224
Logits/chosen: -0.7089

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6782	0.1270	100	0.6611	0.1038	0.0272	0.8000	0.0766	-338.5302	-285.0340	-0.7425	-0.7284
0.5811	0.2540	200	0.5409	0.5575	0.1395	0.8370	0.4180	-336.2845	-275.9589	-0.7306	-0.6945
0.5484	0.3811	300	0.4777	0.9393	0.2286	0.8000	0.7107	-334.5019	-268.3231	-0.7283	-0.7031
0.4531	0.5081	400	0.4535	1.1283	0.2592	0.8296	0.8690	-333.8891	-264.5439	-0.7170	-0.6879
0.4577	0.6351	500	0.4415	1.2504	0.2849	0.8148	0.9655	-333.3753	-262.1006	-0.7146	-0.6865
0.4715	0.7621	600	0.4364	1.2963	0.2864	0.8148	1.0099	-333.3469	-261.1842	-0.7175	-0.6913
0.4508	0.8892	700	0.4348	1.2990	0.2819	0.8222	1.0172	-333.4369	-261.1283	-0.7185	-0.6937

Framework versions

PEFT 0.12.0
Transformers 4.44.0
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1