base_model: | |
- Qwen/Qwen2.5-0.5B-Instruct | |
language: | |
- en | |
library_name: transformers | |
# Model Card for Model ID | |
<!-- Provide a quick summary of what the model is/does. --> | |
This is a Qwen2.5 0.5B Instruct model which got fine-tuned on a dataset generated by Monte Carlo Tree Search based sampling. | |
MCTS was rolled out on a small subset of the GSM8K train split. The resulting traces & value estimates were then used to form the dataset. | |
Only the last two transformer blocks and the regression head were unfroozen. | |
The idea is to use only the value network to do MCTS sampling, without the need of simulating/rolling out. | |
Currently the value network is overfitting, due to very limited samples. Going to update this soon, when I've sampled more data. | |
### Scores on the first 65 samples of the gsm8k test-split: | |
- Beam-search (3 beams): 40.0% | |
- MCTS-search (3 beams): 50.77% | |
The final rollout of the MCTS-search is done also via Beam-serach. During testing on gsm8k, only the value network was used to guide the search. | |
All tests were done with Qwen2.5 0.5B Instruct. |