TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding
Abstract
Understanding domain-specific theorems often requires more than just text-based reasoning; effective communication through structured visual explanations is crucial for deeper comprehension. While large language models (LLMs) demonstrate strong performance in text-based theorem reasoning, their ability to generate coherent and pedagogically meaningful visual explanations remains an open challenge. In this work, we introduce TheoremExplainAgent, an agentic approach for generating long-form theorem explanation videos (over 5 minutes) using Manim animations. To systematically evaluate multimodal theorem explanations, we propose TheoremExplainBench, a benchmark covering 240 theorems across multiple STEM disciplines, along with 5 automated evaluation metrics. Our results reveal that agentic planning is essential for generating detailed long-form videos, and the o3-mini agent achieves a success rate of 93.8% and an overall score of 0.77. However, our quantitative and qualitative studies show that most of the videos produced exhibit minor issues with visual element layout. Furthermore, multimodal explanations expose deeper reasoning flaws that text-based explanations fail to reveal, highlighting the importance of multimodal explanations.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VisPath: Automated Visualization Code Synthesis via Multi-Path Reasoning and Feedback-Driven Optimization (2025)
- Theorem Prover as a Judge for Synthetic Data Generation (2025)
- Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models (2025)
- LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning (2025)
- Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT (2025)
- PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning (2025)
- Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper