jan-hq commited on
Commit
181b0be
·
verified ·
1 Parent(s): cb3f8e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -30,7 +30,7 @@ language:
30
  </div>
31
 
32
  ## About
33
- Developed by **Menlo Research**, **AlphaMaze** is a novel model for evaluating and enhancing visual reasoning in large language models (LLMs). AlphaMaze challenges models with a deceptively simple task: solving mazes presented entirely in text. We further enhance AlphaMaze's capabilities using the GRPO (Generalized Relative Policy Optimization) method.
34
 
35
  Prior research, like [Microsoft's "Multimodal Visualization-of-Thought (MVoT)"](https://arxiv.org/abs/2501.07542), explored visual reasoning through image generation. But AlphaMaze takes a different, more focused path. We believe that if a model can internally reconstruct a maze from a text description and use that *mental map* to plan its moves, it demonstrates a genuine capacity for visual reasoning – even without generating a single image. AlphaMaze moves beyond the limitations of multiple-choice evaluations, providing a richer, more nuanced assessment of a model's spatial understanding. We're not just testing if a model *can* solve a maze; we're revealing *how* it thinks about space.
36
 
 
30
  </div>
31
 
32
  ## About
33
+ Developed by **Menlo Research**, **AlphaMaze** is a novel model for evaluating and enhancing visual reasoning in LLMs. AlphaMaze challenges models with a deceptively simple task: solving mazes presented entirely in text. We further enhance AlphaMaze's capabilities using the GRPO (Generalized Relative Policy Optimization) method.
34
 
35
  Prior research, like [Microsoft's "Multimodal Visualization-of-Thought (MVoT)"](https://arxiv.org/abs/2501.07542), explored visual reasoning through image generation. But AlphaMaze takes a different, more focused path. We believe that if a model can internally reconstruct a maze from a text description and use that *mental map* to plan its moves, it demonstrates a genuine capacity for visual reasoning – even without generating a single image. AlphaMaze moves beyond the limitations of multiple-choice evaluations, providing a richer, more nuanced assessment of a model's spatial understanding. We're not just testing if a model *can* solve a maze; we're revealing *how* it thinks about space.
36