Crispin Almodovar
calmodovar
Β·
AI & ML interests
NLP, log anomaly detection, cyber intelligence
Recent Activity
reacted
to
merve's
post
with π₯
3 days ago
Oof, what a week! π₯΅ So many things have happened, let's recap! https://huggingface.co./collections/merve/jan-24-releases-6793d610774073328eac67a9
Multimodal π¬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG π
- UI-TARS are new models by ByteDance to unlock agentic GUI control π€― in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs π
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! π€―
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio π£οΈ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation β―οΈ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
View all activity
Organizations
calmodovar's activity
-
-
-
-
-
-
-
-
-
-
view article
Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth
view article
Multimodal Augmentation for Documents: Recovering βComprehensionβ in βReading and Comprehensionβ task