β¨ Launched All-Scenario Reasoning Model (language, visual, and search reasoning capabilities) , with medical expertise as one of its key highlights. https://ying.baichuan-ai.com/chat
β¨ Released Baichuan-M1-14B Medical LLM on the hub Available in both Base and Instruct versions, support English & Chinese.
UI-TARS π₯ series of native GUI agent models (2B/7B/72B) released by ByteDance, combining perception, reasoning, grounding, and memory into one system.
What happened yesterday in the Chinese AI community? π
T2A-01-HD π https://hailuo.ai/audio MiniMax's Text-to-Audio model, now in Hailuo AI, offers 300+ voices in 17+ languages and instant emotional voice cloning.
Tare π https://www.trae.ai/ A new coding tool by Bytedance for professional developers, supporting English & Chinese with free access to Claude 3.5 and GPT-4 for a limited time.
Kimi K 1.5 π https://github.com/MoonshotAI/Kimi-k1.5 | https://kimi.ai/ An O1-level multi-modal model by MoonShot AI, utilizing reinforcement learning with long and short-chain-of-thought and supporting up to 128k tokens.
And todayβ¦
Hunyuan 3D-2.0 π tencent/Hunyuan3D-2 A SoTA 3D synthesis system for high-res textured assets by Tencent Hunyuan , with open weights and code!
β¨ MIT License : enabling distillation for custom models β¨ 32B & 70B models match OpenAI o1-mini in multiple capabilities β¨ API live now! Access Chain of Thought reasoning with model='deepseek-reasoner'
InternLM3-8B-instructπ₯ Trained on just 4T tokens, it outperforms Llama3.1-8B and Qwen2.5-7B in reasoning tasks, at 75% lower cost! internlm/internlm3-67875827c377690c01a9131d
β¨ MiniMax-text-01: - 456B with 45.9B activated per token - Combines Lightning Attention, Softmax Attention, and MoE for optimal performance - Training context up to 1M tokens, inference handles 4M tokens
β¨ MiniMax-VL-01: - ViT-MLP-LLM framework ( non-transformerπ) - Handles image inputs from 336Γ336 to 2016Γ2016 - 694M image-caption pairs + 512B tokens processed across 4 stages
MiniCPM-o2.6 π₯ an end-side multimodal LLMs released by OpenBMB from the Chinese community Model: openbmb/MiniCPM-o-2_6 β¨ Real-time English/Chinese conversation, emotion control and ASR/STT β¨ Real-time video/audio understanding β¨ Processes up to 1.8M pixels, leads OCRBench & supports 30+ languages
QvQ-72B-Previewπ an open weight model for visual reasoning just released by Alibaba_Qwen team Qwen/qvq-676448c820912236342b9888 β¨ Combines visual understanding & language reasoning. β¨ Scores 70.3 on MMMU β¨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving
Megrez-3B-Omni π₯ an on-device multimodal LLM by Infinigence AI, another startup emerging from the Tsinghua University ecosystem. Model: Infinigence/Megrez-3B-Omni Demo: Infinigence/Megrez-3B-Omni β¨Supports analysis of image, text, and audio modalities β¨Leads in bilingual speech ( English & Chinese ) input, multi-turn conversations, and voice-based queries β¨Outperforms in scene understanding and OCR across major benchmarks
Audio model: β¨Fish Speech 1.5, Text-to-speech in 13 languages, trained on 1M+ hours of audio by FishAudio fishaudio/fish-speech-1.5 β¨ClearVoice, An advanced voice processing framework by Alibaba Tongyi SpeechAI https://huggingface.co./alibabasglab
HunyuanVideo πΉ The new open video generation model by Tencent! π tencent/HunyuanVideo zh-ai-community/video-models-666afd86cfa4e4dd1473b64c β¨ 13B parameters: Probably the largest open video model to date β¨ Unified architecture for image & video generation β¨ Powered by advanced features: MLLM Text Encoder, 3D VAE, and Prompt Rewrite β¨ Delivers stunning visuals, diverse motion, and unparalleled stability π Fully open with code & weights
Zhipu AI, the Chinese generative AI startup behind CogVideo, just launched their first productized AI Agent - AutoGLM π₯ π https://agent.aminer.cn
With simple text or voice commands, it: β¨ Simulates phone operations effortlessly β¨ Autonomously handles 50+ step tasks β¨ Seamlessly operates across apps
Powered by Zhipu's "Decoupled Interface" and "Self-Evolving Learning Framework" to achieve major performance gains in Phone Use and Web Browser Use!
Meanwhile, GLM4-Edge is now on Hugging Face hubπ π THUDM/glm-edge-6743283c5809de4a7b9e0b8b Packed with advanced dialogue + multimodal models: π± 1.5B / 2B models: Built for mobile & in-car systems π» 4B / 5B models: Optimized for PCs