-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 11 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 30
Collections
Discover the best community collections!
Collections including paper arxiv:2410.02712
-
Large Language Models as Markov Chains
Paper • 2410.02724 • Published • 28 -
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Paper • 2410.02757 • Published • 35 -
LLaVA-Critic: Learning to Evaluate Multimodal Models
Paper • 2410.02712 • Published • 33 -
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Paper • 2410.02367 • Published • 45
-
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
Paper • 2408.00765 • Published • 12 -
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Paper • 2407.21646 • Published • 18 -
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection
Paper • 2408.04284 • Published • 21 -
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability
Paper • 2408.07852 • Published • 14
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 125 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 124 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 49 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 12 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 64
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 54 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 24 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 67 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 72
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 36 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 19