Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,7 @@ language:
|
|
30 |
</div>
|
31 |
|
32 |
## About
|
33 |
-
Developed by **Menlo Research**, **AlphaMaze** is a novel model for evaluating and enhancing visual reasoning in
|
34 |
|
35 |
Prior research, like [Microsoft's "Multimodal Visualization-of-Thought (MVoT)"](https://arxiv.org/abs/2501.07542), explored visual reasoning through image generation. But AlphaMaze takes a different, more focused path. We believe that if a model can internally reconstruct a maze from a text description and use that *mental map* to plan its moves, it demonstrates a genuine capacity for visual reasoning – even without generating a single image. AlphaMaze moves beyond the limitations of multiple-choice evaluations, providing a richer, more nuanced assessment of a model's spatial understanding. We're not just testing if a model *can* solve a maze; we're revealing *how* it thinks about space.
|
36 |
|
|
|
30 |
</div>
|
31 |
|
32 |
## About
|
33 |
+
Developed by **Menlo Research**, **AlphaMaze** is a novel model for evaluating and enhancing visual reasoning in LLMs. AlphaMaze challenges models with a deceptively simple task: solving mazes presented entirely in text. We further enhance AlphaMaze's capabilities using the GRPO (Generalized Relative Policy Optimization) method.
|
34 |
|
35 |
Prior research, like [Microsoft's "Multimodal Visualization-of-Thought (MVoT)"](https://arxiv.org/abs/2501.07542), explored visual reasoning through image generation. But AlphaMaze takes a different, more focused path. We believe that if a model can internally reconstruct a maze from a text description and use that *mental map* to plan its moves, it demonstrates a genuine capacity for visual reasoning – even without generating a single image. AlphaMaze moves beyond the limitations of multiple-choice evaluations, providing a richer, more nuanced assessment of a model's spatial understanding. We're not just testing if a model *can* solve a maze; we're revealing *how* it thinks about space.
|
36 |
|