🌟 BloomVN-0.5B-ppo

A fine-tuned multilingual model for Vietnamese language

📋 Overview

This model serves as a small-scale experiment (0.5B parameters) testing the Reinforcement Learning capabilities of veRL framework. The implementation uses PPO (Proximal Policy Optimization) method on a limited training dataset to evaluate veRL's performance and training behavior.

🔧 Method

The experimentation process was conducted using veRL, focusing on:

Implementation of PPO algorithm with a 0.5B parameter model
Running training experiments on a small dataset
Testing veRL's framework capabilities in handling RL tasks
Evaluating training efficiency and model behavior

This lightweight approach allowed us to assess veRL's performance in a controlled, small-scale environment.

📊 VLMU Benchmark

EVALUATION DATE	STEM 🔬	SOCIAL SCIENCE 🌍	HUMANITIES 📚	OTHERS 🎯	AVG ⭐
07/02/2025	23.18	32.84	32.71	33.67	29.43

🤝 Contributors

Developed with ❤️ by BlossomAI

_{Star ⭐️ this repo if you find it valuable!}

BlossomsAI
/

BloomVN-0.5B-ppo