YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co./docs/hub/model-cards#model-card-metadata)
Llama-3 8B RLHF checkpoint trained by OpenRLHF
Using the models and datasets:
- Base SFT model: https://huggingface.co./OpenLLMAI/Llama-3-8b-sft-mixture
- Reward model: https://huggingface.co./OpenLLMAI/Llama-3-8b-rm-mixture
- Prompt dataset: https://huggingface.co./datasets/OpenLLMAI/prompt-collection-v0.1
Training Hyperparameters
Actor Learning Rate: 5e-7
Critic Learning Rate: 9e-6
Learning Rate Scheduler: Cosine with 0.03 Warmup
PPO epoch: 1
Training Batch Size: 128
Experience Buffer Size: 1024
Reward Normalization: True
Max Prompt Length: 2048
Max Response Length: 2048
Max Samples: 100k (To save GPU resources)
Number of Samples per Prompt: 1
Evaluation
Chat-Arena-Hard
-------------------------------------------
llama-3-8b-sft | score: 5.6
llama-3-8b-rlhf-100k | score: 20.5
Training logs
![](https://cdn-uploads.huggingface.co/production/uploads/63f6c04ac96958470d1e9043/iqwD8jBAX1vhu0PT0ycy8.png)
- Downloads last month
- 191
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.