base_model: Qwen/Qwen2-0.5B-Instruct | |
datasets: dataset_name | |
library_name: transformers | |
model_name: online-dpo-qwen2-3 | |
tags: | |
- trl | |
- online-dpo | |
- generated_from_trainer | |
licence: license | |
# Model Card for online-dpo-qwen2-3 | |
This model is a fine-tuned version of [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co./Qwen/Qwen2-0.5B-Instruct) on the https://huggingface.co./datasets/trl-lib/ultrafeedback-prompt dataset. |