Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published 7 days ago • 22
DateLogicQA: Benchmarking Temporal Biases in Large Language Models Paper • 2412.13377 • Published 8 days ago • 2
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published 6 days ago • 25
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities Paper • 2412.14123 • Published 7 days ago • 11
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 7 days ago • 43
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated Nov 14 • 536
HelpSteer2-Preference: Complementing Ratings with Preferences Paper • 2410.01257 • Published Oct 2 • 21
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated Oct 15 • 148
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders Paper • 2407.13036 • Published Jul 17 • 2
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19 • 47
ColPali: Efficient Document Retrieval with Vision Language Models Paper • 2407.01449 • Published Jun 27 • 42
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated 20 days ago • 180
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 • 170