
A fine-tuned multilingual model for Vietnamese language
π Overview
This model serves as a small-scale experiment (0.5B parameters) testing the Reinforcement Learning capabilities of veRL framework. The implementation uses PPO (Proximal Policy Optimization) method on a limited training dataset to evaluate veRL's performance and training behavior.
π§ Method
The experimentation process was conducted using veRL, focusing on:
- Implementation of PPO algorithm with a 0.5B parameter model
- Running training experiments on a small dataset
- Testing veRL's framework capabilities in handling RL tasks
- Evaluating training efficiency and model behavior
This lightweight approach allowed us to assess veRL's performance in a controlled, small-scale environment.
π VLMU Benchmark
EVALUATION DATE | STEM π¬ | SOCIAL SCIENCE π | HUMANITIES π | OTHERS π― | AVG β |
---|---|---|---|---|---|
07/02/2025 | 23.18 | 32.84 | 32.71 | 33.67 | 29.43 |
π€ Contributors
Developed with β€οΈ by BlossomAI
Star βοΈ this repo if you find it valuable!
- Downloads last month
- 58
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.