|
--- |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
|
|
--- |
|
|
|
![niro](https://huggingface.co./appvoid/niro/resolve/main/niro.webp) |
|
|
|
niro is an improvement over the excellent [WizardLM-Evol-V2-Unfiltered](https://huggingface.co./trollek/danube2-1.8b-WizardLM-Evol-V2-Unfiltered) model, which at the time of writting is the best 1.8 billion parameters mistral model. Keep in mind, nero is an un-trained merge, further improvements are yet to come. |
|
|
|
#### benchmarks |
|
|
|
zero-shot evaluations performed on current sota small models; mmlu is still the reason qwen models are better on average. Currently, niro is on par with the best language model below 2b parameters. |
|
|
|
| Parameters | Model | MMLU | ARC | HellaSwag | PIQA | Winogrande | Average | |
|
| -----------|--------------------------------|-------|-------|-----------|--------|------------|---------| |
|
| 0.5b | qwen 2.5 |47.29|31.83|52.17|70.29|57.06|51.72| |
|
| 0.5b | arco |26.17|37.29|62.88|74.37|62.27|52.60| |
|
| 0.5b | arco (exp) |25.51|38.82|63.02|74.70|61.25|52.66| |
|
| 1.7b | smollm |27.65|**46.26**|65.74|76.06|60.93| 55.33| |
|
| 1.8B | niro-preview |41.75|40.96|**72.07**|**77.97**|**65.51**|**59.65**| |
|
| 1.5b | qwen 2.5 |**58.68**|44.71|67.62|75.73|62.67|**61.88**| |
|
|