Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. β’ 46 items β’ Updated 11 days ago β’ 552
SoundStorm: Efficient Parallel Audio Generation Paper β’ 2305.09636 β’ Published May 16, 2023 β’ 5
CLAP: Contrastive Language-Audio Pretraining Collection CLAP is to audio what CLIP is to image. β’ 5 items β’ Updated Oct 31, 2023 β’ 10
view article Article Design choices for Vision Language Models in 2024 By gigant β’ Apr 16, 2024 β’ 27
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities Paper β’ 2402.01831 β’ Published Feb 2, 2024 β’ 15
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper β’ 2502.02737 β’ Published Feb 4 β’ 199
view article Article Upgrading Kokoro: natural TTS for short bursts By hexgrad β’ Nov 22, 2024 β’ 27
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper β’ 2409.17146 β’ Published Sep 25, 2024 β’ 108
Cosmos Tokenizer Collection A suite of image and video tokenizers β’ 13 items β’ Updated Jan 17 β’ 39
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model Paper β’ 2402.03766 β’ Published Feb 6, 2024 β’ 14
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper β’ 2403.05525 β’ Published Mar 8, 2024 β’ 43
Tora: Trajectory-oriented Diffusion Transformer for Video Generation Paper β’ 2407.21705 β’ Published Jul 31, 2024 β’ 27
Unbounded: A Generative Infinite Game of Character Life Simulation Paper β’ 2410.18975 β’ Published Oct 24, 2024 β’ 37