appvoid
/

niro-preview-2409

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

niro-preview-2409 / README.md

appvoid's picture

Update README.md

6cfe346 verified 3 months ago

|

history blame contribute delete

1.41 kB

	---
	library_name: transformers
	tags:
	- mergekit
	- merge

	---

	![niro](https://huggingface.co./appvoid/niro/resolve/main/niro.webp)

	niro is an improvement over the excellent [WizardLM-Evol-V2-Unfiltered](https://huggingface.co./trollek/danube2-1.8b-WizardLM-Evol-V2-Unfiltered) model, which at the time of writting is the best 1.8 billion parameters mistral model. Keep in mind, nero is an un-trained merge, further improvements are yet to come.

	#### benchmarks

	zero-shot evaluations performed on current sota small models; mmlu is still the reason qwen models are better on average. Currently, niro is on par with the best language model below 2b parameters.

	\| Parameters \| Model \| MMLU \| ARC \| HellaSwag \| PIQA \| Winogrande \| Average \|
	\| -----------\|--------------------------------\|-------\|-------\|-----------\|--------\|------------\|---------\|
	\| 0.5b \| qwen 2.5 \|47.29\|31.83\|52.17\|70.29\|57.06\|51.72\|
	\| 0.5b \| arco \|26.17\|37.29\|62.88\|74.37\|62.27\|52.60\|
	\| 0.5b \| arco (exp) \|25.51\|38.82\|63.02\|74.70\|61.25\|52.66\|
	\| 1.7b \| smollm \|27.65\|46.26\|65.74\|76.06\|60.93\| 55.33\|
	\| 1.8B \| niro-preview \|41.75\|40.96\|72.07\|77.97\|65.51\|59.65\|
	\| 1.5b \| qwen 2.5 \|58.68\|44.71\|67.62\|75.73\|62.67\|61.88\|