Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation Paper • 2410.13232 • Published 24 days ago • 40
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2 • 40
WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents Paper • 2410.07484 • Published Oct 9 • 48
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition Paper • 2410.05603 • Published Oct 8 • 11
Agent S: An Open Agentic Framework that Uses Computers Like a Human Paper • 2410.08164 • Published about 1 month ago • 24
Foundation AI Papers Collection Curated List of Must-Reads on LLM reasoning at Temus AI team • 135 items • Updated Jun 15 • 26
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4 • 32
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction Paper • 2410.01273 • Published Oct 2 • 8
Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak Paper • 2409.04269 • Published Sep 6 • 9
Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments Paper • 2409.05865 • Published Sep 9 • 14
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Paper • 2408.13257 • Published Aug 23 • 25
Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model Paper • 2406.15275 • Published Jun 21 • 10
Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models Paper • 2406.14035 • Published Jun 20 • 12
Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study Paper • 2403.03186 • Published Mar 5 • 5
VideoGUI: A Benchmark for GUI Automation from Instructional Videos Paper • 2406.10227 • Published Jun 14 • 9
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices Paper • 2406.08451 • Published Jun 12 • 23
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published Jun 14 • 27
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model Paper • 2402.07827 • Published Feb 12 • 45