File size: 2,222 Bytes
687c6a7
d4ca45d
687c6a7
d4ca45d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
687c6a7
d84b043
687c6a7
3c17d2f
9a7a256
d4ca45d
 
 
 
 
 
 
3c17d2f
 
4545113
 
3c17d2f
e5b8545
64c4415
3c17d2f
4545113
64c4415
d4ca45d
 
 
 
 
 
83b699f
d4ca45d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
license: apache-2.0
---


<style>
    img{
    user-select: none;
    transition: all 0.2s ease;
    border-radius: .5rem;
  }
    img:hover{
    transform: rotate(2deg);
    filter: invert(100%);
  }
@import url('https://fonts.googleapis.com/css2?family=Vollkorn:ital,wght@0,400..900;1,400..900&display=swap');
</style>

<div style="background-color: transparent; border-radius: .5rem; padding: 2rem; font-family: monospace; font-size: .85rem; text-align: justify;">
  
![cubby](https://huggingface.co./appvoid/cubby/resolve/main/cubby.webp)

This is a passthrough of arco with an experimental model. It improved on arc challenge, only missing 1.2 points to get to the level of modern 3b baseline performance.

If you prefer answering multilingual, general knowledge, trivially simple questions chose qwen or llama. If you prefer solving trivially simple english tasks while being half the size, chose arco.

#### prompt

there is no prompt intentionally set.


#### benchmarks

zero-shot results from state-of-the-art small language models

| Parameters | Model                          | MMLU  | ARC-C | HellaSwag | PIQA   | Winogrande | Average |
| -----------|--------------------------------|-------|-------|-----------|--------|------------|---------|
| 0.5b       | qwen 2                         |44.13| 28.92| 49.05    | 69.31 | 56.99  | 49.68  |
| 0.3b       | smollm                         |25.52| 37.71| 56.41| 71.93| 59.27| 50.17 |
| 0.5b       | danube 3                       | 24.81| 36.18| 60.46| 73.78 | 61.01     | 51.25  |
| 0.5b       | qwen 2.5                       |**47.29**|31.83|52.17|70.29|57.06|51.72|
| 0.5b       | arco                           |26.17|37.29|62.88|74.37|**62.27**|52.60|
| 0.5b       | arco 2                         |25.51|**38.82**|**63.02**|**74.70**|61.25|**52.66**|
#### supporters

<a href="https://ko-fi.com/appvoid" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 34px !important; margin-top: -4px;width: 128px !important; filter: contrast(2) grayscale(100%) brightness(100%);" ></a>

### trivia

arco also means "arc optimized" hence the focus on this cognitive-based benchmark.
</div>