With no regressions, mostly gains over the previous release, this version of Lamarck has broken the 41.0 average maximum for 14B parameter models. As of this writing, Lamarck v0.7 ranks #8 among models under 70B parameters on the Open LLM Leaderboard. Given the quality models in the 32B range, I think Lamarck deserves his shades. A little layer analysis of a model in the 14B range goes a long, long way.
The first DPO finetune of Lamarck has appeared! Check out jpacifico/Chocolatine-2-14B-Instruct-v2.0b2, whose notes say, "The Chocolatine model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance." Lamarck's painstaking merge process was intended to make finetuning to a desired polish as easy and energy-efficient as possible. Thank you, @jpacifico!
Lamarck 14B v0.7: A generalist merge with emphasis on multi-step reasoning, prose, and multi-language ability. The 14B parameter model class has a lot of strong performers, and Lamarck strives to be well-rounded and solid:
Lamarck is produced by a custom toolchain to automate a complex sequences of LoRAs and various layer-targeting merges:
- Extracted LoRA adapters from special-purpose merges
- Custom base models and model_stocks of original models with LoRAs from from huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2 to minimize IFEVAL loss often seen in model_stock merges
- Separate branches for aggressive breadcrumbs and conservative DELLA merges
- Highly targeted weight/density gradients for every 2-4 layers, at each stage
- Finalization through SLERP+TIES merges recombining the the breadcrumbs and DELLA branches to taste
Lamarck's performance comes from an ancestry that goes back through careful merges to select finetuning work, upcycled and combined. Through intermediate merges, arcee-ai/Virtuoso-Small sthenno-com/miscii-14b-1225 and VAGOsolutions/SauerkrautLM-v2-14b-DPO are emphasized in early layers for extra BBH; later layers add synergistic influence from deepseek-ai/DeepSeek-R1-Distill-Qwen-14B, Krystalan/DRT-o1-14B, EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2, and CultriX/Qwen2.5-14B-Wernicke.
More subjectively, its prose and translation abilities are boosted by repeated re-emphasis of Krystalan/DRT-o1-14B and underwoods/medius-erebus-magnum-14b. Other models found in sometimesanotion/Qwenvergence-14B-v3-Prose have their impact on prose quality - and surprising synergy of reasoning.
Kudos to @arcee-ai, @deepseek-ai, @Krystalan, @underwoods, @VAGOSolutions, @CultriX, @sthenno-com, and @rombodawg whose models had the most influence. Vimarckoso v3 has the model card which documents its extended lineage.
- Downloads last month
- 80