Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper β’ 2501.17703 β’ Published 5 days ago β’ 42
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper β’ 2501.13826 β’ Published 11 days ago β’ 22
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper β’ 2501.12326 β’ Published 13 days ago β’ 48
ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models Paper β’ 2406.20015 β’ Published Jun 28, 2024 β’ 1
HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing Paper β’ 2406.11683 β’ Published Jun 17, 2024
HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing Paper β’ 2406.11683 β’ Published Jun 17, 2024
Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Language Models Paper β’ 2410.03212 β’ Published Oct 4, 2024
Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Language Models Paper β’ 2410.03212 β’ Published Oct 4, 2024
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective Paper β’ 2501.11110 β’ Published 15 days ago β’ 2