-
ViTAR: Vision Transformer with Any Resolution
Paper ā¢ 2403.18361 ā¢ Published ā¢ 52 -
BRAVE: Broadening the visual encoding of vision-language models
Paper ā¢ 2404.07204 ā¢ Published ā¢ 18 -
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper ā¢ 2404.15653 ā¢ Published ā¢ 26 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper ā¢ 2405.09818 ā¢ Published ā¢ 126
Collections
Discover the best community collections!
Collections including paper arxiv:2405.09818
-
The Unreasonable Ineffectiveness of the Deeper Layers
Paper ā¢ 2403.17887 ā¢ Published ā¢ 78 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper ā¢ 2404.02258 ā¢ Published ā¢ 104 -
ReFT: Representation Finetuning for Language Models
Paper ā¢ 2404.03592 ā¢ Published ā¢ 90 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper ā¢ 2404.03715 ā¢ Published ā¢ 60
-
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper ā¢ 2401.15947 ā¢ Published ā¢ 49 -
The (R)Evolution of Multimodal Large Language Models: A Survey
Paper ā¢ 2402.12451 ā¢ Published -
deepseek-ai/deepseek-vl-7b-base
Updated ā¢ 154 ā¢ 43 -
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Paper ā¢ 2405.11273 ā¢ Published ā¢ 17
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper ā¢ 2403.03507 ā¢ Published ā¢ 182 -
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Paper ā¢ 2402.03293 ā¢ Published ā¢ 6 -
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation
Paper ā¢ 2401.11316 ā¢ Published ā¢ 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper ā¢ 2405.12130 ā¢ Published ā¢ 45
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper ā¢ 2402.04252 ā¢ Published ā¢ 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper ā¢ 2402.03749 ā¢ Published ā¢ 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper ā¢ 2402.04615 ā¢ Published ā¢ 38 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper ā¢ 2402.05008 ā¢ Published ā¢ 19
-
Can Large Language Models Understand Context?
Paper ā¢ 2402.00858 ā¢ Published ā¢ 21 -
OLMo: Accelerating the Science of Language Models
Paper ā¢ 2402.00838 ā¢ Published ā¢ 80 -
Self-Rewarding Language Models
Paper ā¢ 2401.10020 ā¢ Published ā¢ 144 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper ā¢ 2401.17072 ā¢ Published ā¢ 25
-
Neural Network Diffusion
Paper ā¢ 2402.13144 ā¢ Published ā¢ 94 -
Genie: Generative Interactive Environments
Paper ā¢ 2402.15391 ā¢ Published ā¢ 71 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper ā¢ 2402.17177 ā¢ Published ā¢ 88 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper ā¢ 2403.00522 ā¢ Published ā¢ 44
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 44 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 8 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 159 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 47
-
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper ā¢ 2309.14717 ā¢ Published ā¢ 44 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper ā¢ 2403.05525 ā¢ Published ā¢ 39 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper ā¢ 2405.09818 ā¢ Published ā¢ 126