KTO-4B-lora / README.md
Delta-Vector's picture
Upload 14 files
b400a5b verified
metadata
base_model: anthracite-org/magnum-v2-4b
library_name: peft
license: other
tags:
  - llama-factory
  - lora
  - generated_from_trainer
model-index:
  - name: magnum-4b-KTO-test
    results: []

magnum-4b-KTO-test

This model is a fine-tuned version of anthracite-org/magnum-v2-4b on the combined_new_22k.json dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5030
  • Rewards/chosen: 0.0007
  • Logps/chosen: -11.2857
  • Rewards/rejected: -0.0006
  • Logps/rejected: -10.6547
  • Rewards/margins: 0.0013
  • Kl: 0.0009

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 48
  • total_train_batch_size: 768
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Logps/chosen Rewards/rejected Logps/rejected Rewards/margins Kl
0.5042 0.2788 16 0.5038 0.0004 -11.2884 -0.0004 -10.6529 0.0008 0.0022
0.5037 0.5575 32 0.5033 0.0006 -11.2865 -0.0008 -10.6565 0.0014 0.0013
0.5035 0.8363 48 0.5041 0.0003 -11.2899 -0.0006 -10.6546 0.0008 0.0016
0.5037 1.1151 64 0.5035 0.0005 -11.2872 -0.0005 -10.6540 0.0011 0.0017
0.5036 1.3938 80 0.5036 0.0005 -11.2874 -0.0005 -10.6535 0.0010 0.0010
0.5032 1.6726 96 0.5035 0.0006 -11.2867 -0.0005 -10.6541 0.0011 0.0012
0.5036 1.9514 112 0.5037 0.0006 -11.2869 -0.0006 -10.6546 0.0011 0.0009

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.0.dev0
  • Pytorch 2.2.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1