MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published 2 days ago • 42
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published 1 day ago • 9
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance Paper • 2502.16944 • Published 5 days ago • 9
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning Paper • 2502.19735 • Published 2 days ago • 6
Mobius: Text to Seamless Looping Video Generation via Latent Shift Paper • 2502.20307 • Published 1 day ago • 7
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Paper • 2502.20126 • Published 1 day ago • 9
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published 1 day ago • 13
SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning Paper • 2502.20127 • Published 1 day ago • 7
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Paper • 2502.20395 • Published 1 day ago • 32
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Paper • 2502.19361 • Published 2 days ago • 20
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model Paper • 2502.18906 • Published 3 days ago • 8
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Paper • 2502.19328 • Published 2 days ago • 17
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Paper • 2502.19400 • Published 2 days ago • 34
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Paper • 2502.17422 • Published 4 days ago • 4