Post
1352
๐ข๐ฝ๐ฒ๐ป ๐๐๐ ๐ ๐ฎ๐ฟ๐ฒ ๐ผ๐ป ๐ณ๐ถ๐ฟ๐ฒ ๐ฟ๐ถ๐ด๐ต๐ ๐ป๐ผ๐! ๐ฅ ๐๐ฒ๐ฒ๐ฝ๐ฆ๐ฒ๐ฒ๐ธ-๐ฉ๐ฎ.๐ฑ ๐ฎ๐ป๐ฑ ๐ผ๐๐ต๐ฒ๐ฟ ๐๐ผ๐ฝ ๐ฟ๐ฒ๐น๐ฒ๐ฎ๐๐ฒ๐
Mistral AI just released Pixtral-12B, a vision models that seems to perform extremely well! From Mistralโs own benchmark, it beats the great Qwen2-7B and Llava-OV.
๐ค But Mistralโs benchmarks evaluate in Chain-of-Thought, and even in CoT they show lower scores for other models than the scores already published in non-CoT, which is very strangeโฆ Evaluation is not a settled science!
But itโs only the last of a flurry of great models. Here are the ones currently squatting the top of the Models Hub page:
โถ ๐ ๐๐ฅ๐๐ฆ๐-๐.๐-๐๐ ๐๐ฆ๐ง๐ข, a model built upon Llama-3.1-8B-Instruct, that simultaneously generates text and speech response with an extremely low latency of 250ms (Moshi, Kyutaiโs 8B, did 140ms)
โท ๐๐ฃ๏ธ ๐ ๐ข๐ฌ๐ก ๐๐ฉ๐๐๐๐ก ๐ฏ๐.๐, text-to-speech model that supports 8 languages ๐ฌ๐ง๐จ๐ณ๐ฉ๐ช๐ฏ๐ต๐ซ๐ท๐ช๐ธ๐ฐ๐ท๐ธ๐ฆ with extremely good quality for a light size (~1GB weights) and low latency
โธ ๐ณ ๐๐๐๐ฉ๐๐๐๐ค-๐๐.๐, a 236B model with 128k context length that combines the best of DeepSeek-V2-Chat and the more recent DeepSeek-Coder-V2-Instruct. Depending on benchmarks, it ranks just below Llama-3.1-405B. Released with custom โdeepseekโ license, quite commercially permissive.
โน ๐๐จ๐ฅ๐๐ซ ๐๐ซ๐จ published by Upstage: a 22B model (so inference fits on a single GPU) that comes just under Llama-3.1-70B performance : MMLU: 79, GPQA: 36, IFEval: 84
โบ ๐๐ข๐ง๐ข๐๐๐๐-๐๐, a small model that claims very impressive scores, even beating much larger models like Llama-3.1-8B. Let's wait for more scores because these look too good!
Letโs keep looking, more good stuff is coming our way ๐ญ
Mistral AI just released Pixtral-12B, a vision models that seems to perform extremely well! From Mistralโs own benchmark, it beats the great Qwen2-7B and Llava-OV.
๐ค But Mistralโs benchmarks evaluate in Chain-of-Thought, and even in CoT they show lower scores for other models than the scores already published in non-CoT, which is very strangeโฆ Evaluation is not a settled science!
But itโs only the last of a flurry of great models. Here are the ones currently squatting the top of the Models Hub page:
โถ ๐ ๐๐ฅ๐๐ฆ๐-๐.๐-๐๐ ๐๐ฆ๐ง๐ข, a model built upon Llama-3.1-8B-Instruct, that simultaneously generates text and speech response with an extremely low latency of 250ms (Moshi, Kyutaiโs 8B, did 140ms)
โท ๐๐ฃ๏ธ ๐ ๐ข๐ฌ๐ก ๐๐ฉ๐๐๐๐ก ๐ฏ๐.๐, text-to-speech model that supports 8 languages ๐ฌ๐ง๐จ๐ณ๐ฉ๐ช๐ฏ๐ต๐ซ๐ท๐ช๐ธ๐ฐ๐ท๐ธ๐ฆ with extremely good quality for a light size (~1GB weights) and low latency
โธ ๐ณ ๐๐๐๐ฉ๐๐๐๐ค-๐๐.๐, a 236B model with 128k context length that combines the best of DeepSeek-V2-Chat and the more recent DeepSeek-Coder-V2-Instruct. Depending on benchmarks, it ranks just below Llama-3.1-405B. Released with custom โdeepseekโ license, quite commercially permissive.
โน ๐๐จ๐ฅ๐๐ซ ๐๐ซ๐จ published by Upstage: a 22B model (so inference fits on a single GPU) that comes just under Llama-3.1-70B performance : MMLU: 79, GPQA: 36, IFEval: 84
โบ ๐๐ข๐ง๐ข๐๐๐๐-๐๐, a small model that claims very impressive scores, even beating much larger models like Llama-3.1-8B. Let's wait for more scores because these look too good!
Letโs keep looking, more good stuff is coming our way ๐ญ