Reithan
AI & ML interests
Recent Activity
Organizations
Reithan's activity
Censored
Sorry if this comment/ask is out of line or place, but I've been loving Lamarck and evangelizing it all over. One thing I'd love to see, given the fact that Lamarck has R1 as an element, is a bit more consistency with it's output of <think>
(specifically, without a <think>
prefill, it's not super consistent in adding one, and sometimes with prefill it forgets to close it with </think>
.)
It's use of it's think block, and the way it incorporates its thought into the post-think output though, as honestly better than I've seen even with base R1. It makes logical and mathematical jumps from one step to another that I haven't seen R1 do, and it's better at error-checking itself without vomiting out 5 paragraphs of "well let me double check". Not to mention it's superior prose means the output actually explains what it thought up far better and more clearly.
C4ai-command-r-plus Tokenizing?
It took a custom toolchain around Arcee AI's mergekit to manage the complex merges, gradients, and LoRAs required to make this happen. I really like seeing features of many quality finetunes in one solid generalist model.
I've felt confident that 14B Qwen finetunes and merges could break the 42.0 average, and Arcee **came close** with https://huggingface.co./arcee-ai/Virtuoso-Small-2. Congratulations to @arcee-ai !
Just two months ago, it was easy to think that 14B had plateaued, that you could have high IFEVAL or high MUSR/MATH/GPQA at 14B, but not both. That barrier is completely shattered. I see a pathway to even better, and Virtuoso Small 2 is a big part of why. Very impressive work. This community would expect no less from Arcee.
Just look at this graph! Keep in mind, my merges here build on the first Virtuoso Small, and *-DS merges build on DeepSeek R1. There are some impressive merges in the pipe!