chchen
/

Llama-3.1-8B-Instruct-SAA-900

+---
+base_model: meta-llama/Llama-3.1-8B-Instruct
+library_name: peft
+license: llama3.1
+tags:
+- trl
+- dpo
+- llama-factory
+- generated_from_trainer
+model-index:
+- name: Llama-3.1-8B-Instruct-SAA-900
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# Llama-3.1-8B-Instruct-SAA-900
+This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.1582
+- Rewards/chosen: -0.0113
+- Rewards/rejected: -0.0724
+- Rewards/accuracies: 0.7889
+- Rewards/margins: 0.0611
+- Logps/rejected: -0.7239
+- Logps/chosen: -0.1129
+- Logits/rejected: -0.3937
+- Logits/chosen: -0.3373
+- Sft Loss: 0.0140
+- Odds Ratio Loss: 1.4419
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-06
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 10.0
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Sft Loss | Odds Ratio Loss |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:---------------:|
+| 1.5773        | 0.9877 | 50   | 1.3696          | -0.1315        | -0.1754          | 0.7667             | 0.0440          | -1.7544        | -1.3147      | -0.4663         | -0.4034       | 0.1831   | 11.8657         |
+| 0.2518        | 1.9753 | 100  | 0.2349          | -0.0190        | -0.0732          | 0.8111             | 0.0542          | -0.7321        | -0.1898      | -0.4483         | -0.3781       | 0.0216   | 2.1323          |
+| 0.1304        | 2.9630 | 150  | 0.1530          | -0.0109        | -0.0612          | 0.8111             | 0.0502          | -0.6117        | -0.1094      | -0.4032         | -0.3454       | 0.0131   | 1.3988          |
+| 0.1129        | 3.9506 | 200  | 0.1515          | -0.0108        | -0.0582          | 0.8222             | 0.0474          | -0.5819        | -0.1084      | -0.4031         | -0.3480       | 0.0132   | 1.3828          |
+| 0.1194        | 4.9383 | 250  | 0.1522          | -0.0109        | -0.0642          | 0.8222             | 0.0533          | -0.6417        | -0.1088      | -0.3982         | -0.3417       | 0.0133   | 1.3891          |
+| 0.0898        | 5.9259 | 300  | 0.1535          | -0.0110        | -0.0684          | 0.8111             | 0.0574          | -0.6839        | -0.1101      | -0.3960         | -0.3402       | 0.0136   | 1.3989          |
+| 0.0928        | 6.9136 | 350  | 0.1572          | -0.0113        | -0.0679          | 0.7889             | 0.0567          | -0.6794        | -0.1125      | -0.3949         | -0.3394       | 0.0140   | 1.4318          |
+| 0.0855        | 7.9012 | 400  | 0.1578          | -0.0112        | -0.0722          | 0.8000             | 0.0609          | -0.7215        | -0.1125      | -0.3935         | -0.3375       | 0.0138   | 1.4394          |
+| 0.0985        | 8.8889 | 450  | 0.1574          | -0.0112        | -0.0720          | 0.8000             | 0.0608          | -0.7205        | -0.1122      | -0.3934         | -0.3372       | 0.0138   | 1.4358          |
+| 0.0859        | 9.8765 | 500  | 0.1582          | -0.0113        | -0.0724          | 0.7889             | 0.0611          | -0.7239        | -0.1129      | -0.3937         | -0.3373       | 0.0140   | 1.4419          |
+### Framework versions
+- PEFT 0.12.0
+- Transformers 4.45.2
+- Pytorch 2.3.0
+- Datasets 2.19.0
+- Tokenizers 0.20.0

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ffa75a8779b11421b6657ce3c4cc537f4b482c6eb91b3449112ad6b4fc92c28a
 size 83945296

 version https://git-lfs.github.com/spec/v1
+oid sha256:9c6a94a623523bbeaf5d576fdcda0d16b992e013500214b4f1e6e8b71973b8c4
 size 83945296