Logo

🌟 BloomVN-0.5B-ppo

A fine-tuned multilingual model for Vietnamese language

πŸ“‹ Overview

This model serves as a small-scale experiment (0.5B parameters) testing the Reinforcement Learning capabilities of veRL framework. The implementation uses PPO (Proximal Policy Optimization) method on a limited training dataset to evaluate veRL's performance and training behavior.

πŸ”§ Method

The experimentation process was conducted using veRL, focusing on:

  • Implementation of PPO algorithm with a 0.5B parameter model
  • Running training experiments on a small dataset
  • Testing veRL's framework capabilities in handling RL tasks
  • Evaluating training efficiency and model behavior

This lightweight approach allowed us to assess veRL's performance in a controlled, small-scale environment.

πŸ“Š VLMU Benchmark

EVALUATION DATE STEM πŸ”¬ SOCIAL SCIENCE 🌍 HUMANITIES πŸ“š OTHERS 🎯 AVG ⭐
07/02/2025 23.18 32.84 32.71 33.67 29.43

🀝 Contributors

Developed with ❀️ by BlossomAI


Star ⭐️ this repo if you find it valuable!
Downloads last month
58
Safetensors
Model size
494M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for BlossomsAI/BloomVN-0.5B-ppo

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(109)
this model
Quantizations
1 model

Dataset used to train BlossomsAI/BloomVN-0.5B-ppo