Trained with tokenizer of OpenRLHF/Llama-3-8b-sft-mixture.

Downloads last month: 80

Safetensors

Model size

1.24B params

Tensor type

BF16

Inference Providers NEW

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for RTO-RL/Llama3.2-1B-RewardModel

Base model

meta-llama/Llama-3.2-1B-Instruct

Finetuned

unsloth/Llama-3.2-1B-Instruct

Finetuned

(90)

this model

RTO-RL
/

Llama3.2-1B-RewardModel

Model tree for RTO-RL/Llama3.2-1B-RewardModel

Dataset used to train RTO-RL/Llama3.2-1B-RewardModel