Edit model card

log_sage_reward_model

This model is a fine-tuned version of distilbert/distilbert-base-uncased on the hdfs_rlhf_log_summary_dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4242
  • Accuracy: 1.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.41e-05
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 1 0.6936 0.8
No log 2.0 3 0.6931 0.8
No log 3.0 5 0.6928 1.0
No log 4.0 6 0.6927 1.0
No log 5.0 8 0.6923 1.0
0.2849 6.0 10 0.6915 1.0
0.2849 7.0 11 0.6908 1.0
0.2849 8.0 13 0.6889 1.0
0.2849 9.0 15 0.6838 1.0
0.2849 10.0 16 0.6788 1.0
0.2849 11.0 18 0.6633 1.0
0.2669 12.0 20 0.6464 1.0
0.2669 13.0 21 0.6422 1.0
0.2669 14.0 23 0.6312 1.0
0.2669 15.0 25 0.5991 1.0
0.2669 16.0 26 0.5796 1.0
0.2669 17.0 27 0.5571 1.0
0.2669 18.0 29 0.5255 1.0
0.2252 19.0 31 0.5055 1.0
0.2252 20.0 32 0.4967 1.0
0.2252 21.0 34 0.4841 1.0
0.2252 22.0 36 0.4742 1.0
0.2252 23.0 37 0.4700 1.0
0.2252 24.0 39 0.4633 1.0
0.1245 25.0 41 0.4573 1.0
0.1245 26.0 42 0.4547 1.0
0.1245 27.0 44 0.4501 1.0
0.1245 28.0 46 0.4462 1.0
0.1245 29.0 47 0.4444 1.0
0.1245 30.0 49 0.4415 1.0
0.0996 31.0 51 0.4390 1.0
0.0996 32.0 52 0.4378 1.0
0.0996 33.0 53 0.4368 1.0
0.0996 34.0 55 0.4349 1.0
0.0996 35.0 57 0.4333 1.0
0.0996 36.0 58 0.4326 1.0
0.0862 37.0 60 0.4315 1.0
0.0862 38.0 62 0.4306 1.0
0.0862 39.0 63 0.4301 1.0
0.0862 40.0 65 0.4294 1.0
0.0862 41.0 67 0.4288 1.0
0.0862 42.0 68 0.4285 1.0
0.0765 43.0 70 0.4281 1.0
0.0765 44.0 72 0.4276 1.0
0.0765 45.0 73 0.4272 1.0
0.0765 46.0 75 0.4265 1.0
0.0765 47.0 77 0.4261 1.0
0.0765 48.0 78 0.4259 1.0
0.0765 49.0 79 0.4257 1.0
0.0783 50.0 81 0.4253 1.0
0.0783 51.0 83 0.4250 1.0
0.0783 52.0 84 0.4249 1.0
0.0783 53.0 86 0.4247 1.0
0.0783 54.0 88 0.4246 1.0
0.0783 55.0 89 0.4245 1.0
0.0652 56.0 91 0.4244 1.0
0.0652 57.0 93 0.4243 1.0
0.0652 58.0 94 0.4243 1.0
0.0652 59.0 96 0.4242 1.0
0.0652 60.0 98 0.4242 1.0
0.0652 61.0 99 0.4242 1.0
0.0655 61.09 100 0.4242 1.0

Framework versions

  • Transformers 4.39.0
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
10
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for IrwinD/log_sage_reward_model

Finetuned
(6636)
this model

Evaluation results