Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2411.10440

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 37
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 45
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published 27 days ago • 35
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 45

Multimodal Dataset

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

Paper • 2407.09413 • Published Jul 12, 2024 • 10
MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published Jul 11, 2024 • 32
Kvasir-VQA: A Text-Image Pair GI Tract Dataset

Paper • 2409.01437 • Published Sep 2, 2024 • 71
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

Paper • 2409.05840 • Published Sep 9, 2024 • 47

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 182
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 49
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 41

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 113

Video Creation by Demonstration

Paper • 2412.09551 • Published Dec 12, 2024 • 8
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 45
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published Dec 9, 2024 • 71
APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published Dec 6, 2024 • 38

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 113

Trending Papers - November ✨

Most upvoted paper on the Daily Papers

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7, 2024 • 113
LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 113
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Paper • 2411.14405 • Published Nov 21, 2024 • 58
ROICtrl: Boosting Instance Control for Visual Generation

Paper • 2411.17949 • Published Nov 27, 2024 • 82

Running on Zero

36

💬

Llama 3.2V 11B Cot
Xkev/Llama-3.2V-11B-cot

Image-Text-to-Text • Updated Dec 16, 2024 • 7.81k • 141
Xkev/LLaVA-CoT-100k

Viewer • Updated Nov 27, 2024 • 98.6k • 1.45k • 67
LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 113

Nov 22 Releases ❄️

mistralai/Pixtral-Large-Instruct-2411

Image-Text-to-Text • Updated about 1 month ago • 4 • 391
microsoft/orca-agentinstruct-1M-v1

Viewer • Updated Nov 1, 2024 • 1.05M • 8.41k • 414
Xkev/Llama-3.2V-11B-cot

Image-Text-to-Text • Updated Dec 16, 2024 • 7.81k • 141
jinaai/jina-clip-v2

Zero-Shot Image Classification • Updated 13 days ago • 38.9k • 169

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 113
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?

Paper • 2411.06469 • Published Nov 10, 2024 • 17
Sharingan: Extract User Action Sequence from Desktop Recordings

Paper • 2411.08768 • Published Nov 13, 2024 • 10
AnimateAnything: Consistent and Controllable Animation for Video Generation

Paper • 2411.10836 • Published Nov 16, 2024 • 23

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs