dipta007
's Collections
Small Multimodal Models
updated
Textbooks Are All You Need
Paper
•
2306.11644
•
Published
•
142
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model
Paper
•
2401.02330
•
Published
•
14
Textbooks Are All You Need II: phi-1.5 technical report
Paper
•
2309.05463
•
Published
•
87
Visual Instruction Tuning
Paper
•
2304.08485
•
Published
•
13
Improved Baselines with Visual Instruction Tuning
Paper
•
2310.03744
•
Published
•
37
CORE-MM: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large
Language Models
Paper
•
2311.11567
•
Published
•
8
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage
and Sharing in LLMs
Paper
•
2311.15759
•
Published
•
1
Unlock the Power: Competitive Distillation for Multi-Modal Large
Language Models
Paper
•
2311.08213
•
Published
Generative Multimodal Models are In-Context Learners
Paper
•
2312.13286
•
Published
•
34
InfMLLM: A Unified Framework for Visual-Language Tasks
Paper
•
2311.06791
•
Published
•
3
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper
•
2401.13601
•
Published
•
44
Efficient Multimodal Learning from Data-centric Perspective
Paper
•
2402.11530
•
Published
•
1
Multi-modal preference alignment remedies regression of visual
instruction tuning on language model
Paper
•
2402.10884
•
Published
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper
•
2401.02415
•
Published
•
53
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant
for Mobile Devices
Paper
•
2312.16886
•
Published
•
19
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper
•
2402.03766
•
Published
•
12
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Paper
•
2402.16840
•
Published
•
23
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper
•
2402.14289
•
Published
•
19
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
•
2402.14905
•
Published
•
126