SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 8 days ago • 118
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? Paper • 2502.13233 • Published 10 days ago • 13
Baichuan-M1: Pushing the Medical Capability of Large Language Models Paper • 2502.12671 • Published 10 days ago • 1
Scaling Test-Time Compute Without Verification or RL is Suboptimal Paper • 2502.12118 • Published 11 days ago • 1
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 10 days ago • 76
Is Noise Conditioning Necessary for Denoising Generative Models? Paper • 2502.13129 • Published 10 days ago • 1
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario Paper • 2501.10132 • Published Jan 17 • 19
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22 • 83
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published about 1 month ago • 55
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 24 days ago • 195
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 18 days ago • 140
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published 17 days ago • 28
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 16 days ago • 142
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published 14 days ago • 30