Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.06783

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

Paper • 2401.10529 • Published Jan 19 • 1
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 26
SVIT: Scaling up Visual Instruction Tuning

Paper • 2307.04087 • Published Jul 9, 2023 • 6

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 26
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

Paper • 2311.07574 • Published Nov 13, 2023 • 14
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding

Paper • 2401.04575 • Published Jan 9 • 14
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Paper • 2402.00159 • Published Jan 31 • 59

LayoutPrompter: Awaken the Design Ability of Large Language Models

Paper • 2311.06495 • Published Nov 11, 2023 • 10
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 26
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 45
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Paper • 2311.04589 • Published Nov 8, 2023 • 18

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 26
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4

Paper • 2311.07361 • Published Nov 13, 2023 • 12
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 183
teknium/openhermes

Viewer • Updated Sep 7, 2023 • 243k • 398 • 198

Generative Multiple Modality

Random Field Augmentations for Self-Supervised Representation Learning

Paper • 2311.03629 • Published Nov 7, 2023 • 6
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Paper • 2311.04589 • Published Nov 8, 2023 • 18
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

Paper • 2311.04901 • Published Nov 8, 2023 • 7
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 26

OmnimatteRF: Robust Omnimatte with 3D Background Modeling

Paper • 2309.07749 • Published Sep 14, 2023 • 7
AudioSR: Versatile Audio Super-resolution at Scale

Paper • 2309.07314 • Published Sep 13, 2023 • 25
Generative Image Dynamics

Paper • 2309.07906 • Published Sep 14, 2023 • 52
MagiCapture: High-Resolution Multi-Concept Portrait Customization

Paper • 2309.06895 • Published Sep 13, 2023 • 27

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs