Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 8 items • Updated 5 days ago • 379
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated Jan 8 • 563
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 24 days ago • 196
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 335
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations Paper • 2501.03403 • Published Jan 6 • 4
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 134
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 125
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais and 2 others • Nov 13, 2024 • 99
One missing piece in Vision and Language: A Survey on Comics Understanding Paper • 2409.09502 • Published Sep 14, 2024 • 25