-
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 113 -
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?
Paper • 2411.06469 • Published • 17 -
Sharingan: Extract User Action Sequence from Desktop Recordings
Paper • 2411.08768 • Published • 10 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2411.06469
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 51 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 41 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 53
-
Multimodal Contrastive Representation Learning in Augmented Biomedical Knowledge Graphs
Paper • 2501.01644 • Published • 1 -
Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain
Paper • 2412.20309 • Published -
On the Compositional Generalization of Multimodal LLMs for Medical Imaging
Paper • 2412.20070 • Published • 43 -
MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes
Paper • 2412.19260 • Published • 1