Google just released PaliGemma 2 Mix: new versatile instruction vision language models π₯
> Three new models: 3B, 10B, 28B with res 224, 448 π > Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything π€―
Given an input image, it generates several queries along with explanations to justify them. This approach can generate synthetic data for fine-tuning ColPali models.
Multimodal π¬ - We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG π - UI-TARS are new models by ByteDance to unlock agentic GUI control π€― in 2B, 7B and 72B - Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B - MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context - Dataset: Yale released a new benchmark called MMVU - Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs π - DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! π€― - Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B - NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio π£οΈ - Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B - TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation β―οΈ - Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux - tencent released Hunyuan3D-2, new 3D asset generation from images
π₯ The AI Agent hype is real! This blog post deep dives into everything you need to know before deploying them: from key definitions to practical recommendations. A must-read for anyone building the future of autonomous systems.
π Key insight: A clear table breaking down the 5 levels of AI agents - from simple processors to fully autonomous systems. Essential framework for understanding where your agent stands on the autonomy spectrum
βοΈ Deep analysis of 15 core values reveals critical trade-offs: accuracy, privacy, safety, equity & more. The same features that make agents powerful can make them risky. Understanding these trade-offs is crucial for responsible deployment
π― 6 key recommendations for the road ahead: - Create rigorous evaluation protocols - Study societal effects - Understand ripple effects - Improve transparency - Open source can make a positive difference - Monitor base model evolution