File size: 7,122 Bytes
71938a3
 
 
 
 
 
 
 
 
 
 
2400e03
 
 
 
71938a3
2400e03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db160d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71938a3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
license: cc-by-nc-sa-4.0
language:
- en
library_name: transformers
tags:
- UNA
- juanako
- mixtral
- MoE
---
# UNAversal - Uniform Neural Alignment (MoE)

This is just a beta, a first release so people can start working on franksteins and so.
It does achieve high GSM/Math and TQA, so ideally you can merge it with other mixtrals and see what coming out of it
Based on [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co./mistralai/Mixtral-8x7B-Instruct-v0.1)

## UNA Details
For this model we came out with the most obvious, placing UNA on the router_logit. It does work, but we saw a much better performance on SFT by doing so.
So this model DOES have UNA-SFT phase, its highly experimental and it was merely using LLaMA-Factory datasets by example alpaca.

As the others:
- Can be finetuned further, try 2e-5 or **1e-4 (since its MOE)**
- Can be merged, here you will have to improvise and please report findings on a discussion thread.

**REMINDER**: please.. cite, it does help on the research and the lab itself, seriously.

## NEED YOUR HELP!!
I need a multi-turn trainloop for the Mixtral, that can squeeze the juice out of 8xH100's properly. Please feel free to reach @fblgit either discord or twitter. thanks!

# Evals
Here there are some, but we also submitted it to the HF eval queue....

## GSM8k 5-Shot
```
|Tasks|Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|-----|-------|----------|-----:|-----------|-----:|---|-----:|
|gsm8k|Yaml   |get-answer|     5|exact_match|0.6603|±  | 0.013|
```
## ARC 25-Shot
```
|    Tasks    |Version|Filter|n-shot| Metric |Value |   |Stderr|
|-------------|-------|------|-----:|--------|-----:|---|-----:|
|arc_challenge|Yaml   |none  |    25|acc     |0.6621|±  |0.0138|
|             |       |none  |    25|acc_norm|0.6962|±  |0.0134|
```

## TruthfulQA 0-Shot (MC2)
```
|    Tasks     |Version|Filter|n-shot|Metric|Value |   |Stderr|
|--------------|-------|------|-----:|------|-----:|---|-----:|
|truthfulqa_mc2|Yaml   |none  |     0|acc   |0.7122|±  |0.0141|
```

## 0-Shots Evals
```
|    Tasks     |Version|Filter|n-shot|  Metric  |Value |   |Stderr|
|--------------|-------|------|-----:|----------|-----:|---|-----:|
|arc_challenge |Yaml   |none  |     0|acc       |0.6101|±  |0.0143|
|              |       |none  |     0|acc_norm  |0.6425|±  |0.0140|
|arc_easy      |Yaml   |none  |     0|acc       |0.8615|±  |0.0071|
|              |       |none  |     0|acc_norm  |0.8375|±  |0.0076|
|boolq         |Yaml   |none  |     0|acc       |0.8624|±  |0.0060|
|lambada_openai|Yaml   |none  |     0|perplexity|2.8318|±  |0.0507|
|              |       |none  |     0|acc       |0.7650|±  |0.0059|
|mathqa        |Yaml   |none  |     0|acc       |0.4472|±  |0.0091|
|              |       |none  |     0|acc_norm  |0.4436|±  |0.0091|
|piqa          |Yaml   |none  |     0|acc       |0.8292|±  |0.0088|
|              |       |none  |     0|acc_norm  |0.8422|±  |0.0085|
|pubmedqa      |Yaml   |none  |     0|acc       |0.7920|±  |0.0182|
|sciq          |Yaml   |none  |     0|acc       |0.9630|±  |0.0060|
|              |       |none  |     0|acc_norm  |0.9370|±  |0.0077|
```

## BBH
```
vllm (pretrained=fblgit/UNAversal-8x7B-v1beta,tensor_parallel_size=2,data_parallel_size=4,gpu_memory_utilization=0.8,dtype=float16), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto
|                          Tasks                           |Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|----------------------------------------------------------|-------|----------|-----:|-----------|-----:|---|-----:|
|bbh                                                       |N/A    |get-answer|     0|exact_match|0.6752|±  |0.1772|
| - bbh_cot_fewshot_boolean_expressions                    |Yaml   |get-answer|     0|exact_match|0.8840|±  |0.0203|
| - bbh_cot_fewshot_causal_judgement                       |Yaml   |get-answer|     0|exact_match|0.6417|±  |0.0352|
| - bbh_cot_fewshot_date_understanding                     |Yaml   |get-answer|     0|exact_match|0.7600|±  |0.0271|
| - bbh_cot_fewshot_disambiguation_qa                      |Yaml   |get-answer|     0|exact_match|0.7160|±  |0.0286|
| - bbh_cot_fewshot_dyck_languages                         |Yaml   |get-answer|     0|exact_match|0.1800|±  |0.0243|
| - bbh_cot_fewshot_formal_fallacies                       |Yaml   |get-answer|     0|exact_match|0.6520|±  |0.0302|
| - bbh_cot_fewshot_geometric_shapes                       |Yaml   |get-answer|     0|exact_match|0.3880|±  |0.0309|
| - bbh_cot_fewshot_hyperbaton                             |Yaml   |get-answer|     0|exact_match|0.9600|±  |0.0124|
| - bbh_cot_fewshot_logical_deduction_five_objects         |Yaml   |get-answer|     0|exact_match|0.5360|±  |0.0316|
| - bbh_cot_fewshot_logical_deduction_seven_objects        |Yaml   |get-answer|     0|exact_match|0.5040|±  |0.0317|
| - bbh_cot_fewshot_logical_deduction_three_objects        |Yaml   |get-answer|     0|exact_match|0.8600|±  |0.0220|
| - bbh_cot_fewshot_movie_recommendation                   |Yaml   |get-answer|     0|exact_match|0.7840|±  |0.0261|
| - bbh_cot_fewshot_multistep_arithmetic_two               |Yaml   |get-answer|     0|exact_match|0.6600|±  |0.0300|
| - bbh_cot_fewshot_navigate                               |Yaml   |get-answer|     0|exact_match|0.8160|±  |0.0246|
| - bbh_cot_fewshot_object_counting                        |Yaml   |get-answer|     0|exact_match|0.8360|±  |0.0235|
| - bbh_cot_fewshot_penguins_in_a_table                    |Yaml   |get-answer|     0|exact_match|0.7329|±  |0.0367|
| - bbh_cot_fewshot_reasoning_about_colored_objects        |Yaml   |get-answer|     0|exact_match|0.8120|±  |0.0248|
| - bbh_cot_fewshot_ruin_names                             |Yaml   |get-answer|     0|exact_match|0.4440|±  |0.0315|
| - bbh_cot_fewshot_salient_translation_error_detection    |Yaml   |get-answer|     0|exact_match|0.5200|±  |0.0317|
| - bbh_cot_fewshot_snarks                                 |Yaml   |get-answer|     0|exact_match|0.7135|±  |0.0340|
| - bbh_cot_fewshot_sports_understanding                   |Yaml   |get-answer|     0|exact_match|0.9400|±  |0.0151|
| - bbh_cot_fewshot_temporal_sequences                     |Yaml   |get-answer|     0|exact_match|0.7560|±  |0.0272|
| - bbh_cot_fewshot_tracking_shuffled_objects_five_objects |Yaml   |get-answer|     0|exact_match|0.5680|±  |0.0314|
| - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects|Yaml   |get-answer|     0|exact_match|0.6280|±  |0.0306|
| - bbh_cot_fewshot_tracking_shuffled_objects_three_objects|Yaml   |get-answer|     0|exact_match|0.6280|±  |0.0306|
| - bbh_cot_fewshot_web_of_lies                            |Yaml   |get-answer|     0|exact_match|0.9560|±  |0.0130|
| - bbh_cot_fewshot_word_sorting                           |Yaml   |get-answer|     0|exact_match|0.3800|±  |0.0308|

|Groups|Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|------|-------|----------|-----:|-----------|-----:|---|-----:|
|bbh   |N/A    |get-answer|     0|exact_match|0.6752|±  |0.1772|
```