Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published 9 days ago • 56
view article Article Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset By sdiazlor • 18 days ago • 44
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 207
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding Paper • 2501.18362 • Published 29 days ago • 21
Running on Zero 1.84k 1.84k Chat With Janus-Pro-7B 🌍 A unified multimodal understanding and generation model.
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published Dec 24, 2024 • 37
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 94
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 80
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 135
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 107
view article Article Running Your Custom LoRA Fine-Tuned MusicGen Large Locally By theeseus-ai • Dec 6, 2024 • 1