Asynchronous RLHF
Collection
Models and datasets for asynchronous rlhf paper, see code at https://github.com/mnoukhov/async_rlhf
•
10 items
•
Updated
This model is a fine-tuned version of EleutherAI/pythia-2.8b-deduped on an unknown dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.3714 | 0.2007 | 183 | 2.2949 |
2.2873 | 0.4013 | 366 | 2.2773 |
2.2732 | 0.6020 | 549 | 2.2656 |
2.2562 | 0.8026 | 732 | 2.2578 |