--- base_model: Qwen/Qwen2-0.5B-Instruct datasets: dataset_name library_name: transformers model_name: online-dpo-qwen2-3 tags: - trl - online-dpo - generated_from_trainer licence: license --- # Model Card for online-dpo-qwen2-3 This model is a fine-tuned version of [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co./Qwen/Qwen2-0.5B-Instruct) on the https://huggingface.co./datasets/trl-lib/ultrafeedback-prompt dataset.