view article Article DualPipe could be better without the Dual By ufotalent • about 18 hours ago • 9
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Paper • 2502.18137 • Published 3 days ago • 46
MoBA: Mixture of Block Attention for Long-Context LLMs Paper • 2502.13189 • Published 10 days ago • 12
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 8 days ago • 92
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 12 days ago • 134
Logical Reasoning in Large Language Models: A Survey Paper • 2502.09100 • Published 15 days ago • 22
ACECODER: Acing Coder RL via Automated Test-Case Synthesis Paper • 2502.01718 • Published 25 days ago • 28
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published about 1 month ago • 55
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published Jan 7 • 50
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published Jan 7 • 69
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published Jan 4 • 90
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published Dec 6, 2024 • 47
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation Paper • 2412.00927 • Published Dec 1, 2024 • 26
NeuFlow v2: High-Efficiency Optical Flow Estimation on Edge Devices Paper • 2408.10161 • Published Aug 19, 2024 • 15
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published Nov 11, 2024 • 36
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models Paper • 2411.07140 • Published Nov 11, 2024 • 33
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper • 2411.07199 • Published Nov 11, 2024 • 47
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 287
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 39