Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2411.14432

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 12
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 31

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 40
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 20

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published 19 days ago • 35
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published 19 days ago • 44
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published 13 days ago • 33
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published 21 days ago • 45

Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

THUdyh/Insight-V-Summary

Text Generation • Updated Nov 22, 2024 • 38 • 8
THUdyh/Insight-V-Reason

Text Generation • Updated Nov 22, 2024 • 32 • 9
THUdyh/Insight-V-Reason-LLaMA3

Text Generation • Updated Nov 22, 2024 • 36 • 3
THUdyh/Insight-V-Summary-LLaMA3

Text Generation • Updated Nov 22, 2024 • 25 • 3

Multimodal Alignment

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 34
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 71
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Paper • 2411.18203 • Published Nov 27, 2024 • 32
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 23

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 14
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 6

Synthetic Data and Self-Improvement

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 146
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6, 2024 • 114
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Paper • 2402.07456 • Published Feb 12, 2024 • 42
Learning From Mistakes Makes LLM Better Reasoner

Paper • 2310.20689 • Published Oct 31, 2023 • 28

Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 37
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 53
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 32
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 23

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs