metadata
base_model: slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1
datasets:
- slm-research-vn/dpo-format-function-calling-v2
- >-
slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4
- argilla/dpo-mix-7k
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: Qwen2-7B-Instruct-SPPO-Function-call-v2.4
results: []
Qwen2-7B-Instruct-SPPO-Function-call-v2.4
This model is a fine-tuned version of slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1 on the slm-research-vn/dpo-format-function-calling-v2, the slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4 and the argilla/dpo-mix-7k datasets. It achieves the following results on the evaluation set:
- Loss: 0.4345
- Rewards/chosen: 1.3033
- Rewards/rejected: 0.2776
- Rewards/accuracies: 0.8185
- Rewards/margins: 1.0258
- Logps/rejected: -333.5228
- Logps/chosen: -261.0424
- Logits/rejected: -0.7224
- Logits/chosen: -0.7089
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6782 | 0.1270 | 100 | 0.6611 | 0.1038 | 0.0272 | 0.8000 | 0.0766 | -338.5302 | -285.0340 | -0.7425 | -0.7284 |
0.5811 | 0.2540 | 200 | 0.5409 | 0.5575 | 0.1395 | 0.8370 | 0.4180 | -336.2845 | -275.9589 | -0.7306 | -0.6945 |
0.5484 | 0.3811 | 300 | 0.4777 | 0.9393 | 0.2286 | 0.8000 | 0.7107 | -334.5019 | -268.3231 | -0.7283 | -0.7031 |
0.4531 | 0.5081 | 400 | 0.4535 | 1.1283 | 0.2592 | 0.8296 | 0.8690 | -333.8891 | -264.5439 | -0.7170 | -0.6879 |
0.4577 | 0.6351 | 500 | 0.4415 | 1.2504 | 0.2849 | 0.8148 | 0.9655 | -333.3753 | -262.1006 | -0.7146 | -0.6865 |
0.4715 | 0.7621 | 600 | 0.4364 | 1.2963 | 0.2864 | 0.8148 | 1.0099 | -333.3469 | -261.1842 | -0.7175 | -0.6913 |
0.4508 | 0.8892 | 700 | 0.4348 | 1.2990 | 0.2819 | 0.8222 | 1.0172 | -333.4369 | -261.1283 | -0.7185 | -0.6937 |
Framework versions
- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1