Submitted by dongguanting 78 We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? · 18 authors 9
Submitted by hba123 61 ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning · 22 authors 6
Submitted by SivilTaram 37 RegMix: Data Mixture as Regression for Language Model Pre-training · 8 authors 7
Submitted by leonardPKU 36 MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation · 16 authors 2
Submitted by AJZhou 25 Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning · 7 authors 4
Submitted by wanghaofan 24 InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation · 6 authors 5
Submitted by Koi953215 24 DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models · 6 authors 5
Submitted by omergoldman 23 Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP · 6 authors 1
Submitted by naoyuki82 23 E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS · 13 authors 4
Submitted by yingtai 20 RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network · 10 authors 2
Submitted by zhwang4ai 13 OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents · 10 authors 5
Submitted by LXT 12 Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language · 7 authors 2
Submitted by Neph0s 12 Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs · 6 authors 2
Submitted by Shijie 11 T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge · 7 authors 1
Submitted by wanchichen 11 Towards Robust Speech Representation Learning for Thousands of Languages · 10 authors 1
Submitted by akhaliq 10 SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix · 8 authors 1
Submitted by davanstrien 9 Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines for Zero-Shot NER · 5 authors 1
Submitted by gsarti 6 Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs · 4 authors 4
Submitted by BFauber 6 Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models · 1 authors 2
Submitted by iliashum 6 UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI · 9 authors 1
Submitted by hank0316 6 DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging · 4 authors 1
Submitted by JRQi 4 The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models · 7 authors 1