Update README.md
Browse files
README.md
CHANGED
@@ -25,4 +25,8 @@ Currently the value network is overfitting, due to very limited samples. Going t
|
|
25 |
- MCTS-search (3 beams): 50.77%
|
26 |
|
27 |
|
28 |
-
The final rollout of the MCTS-search is done also via Beam-serach. During testing on gsm8k, only the value network was used to guide the search.
|
|
|
|
|
|
|
|
|
|
25 |
- MCTS-search (3 beams): 50.77%
|
26 |
|
27 |
|
28 |
+
The final rollout of the MCTS-search is done also via Beam-serach. During testing on gsm8k, only the value network was used to guide the search.
|
29 |
+
|
30 |
+
|
31 |
+
|
32 |
+
All tests were done with Qwen2.5 0.5B Instruct.
|