File size: 236,775 Bytes
644c647
 
 
 
 
 
 
 
 
ab06da5
24784db
 
f421e19
24784db
 
 
f953c6c
b3ee3e9
 
24784db
 
 
f421e19
cccf104
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b3ee3e9
 
cccf104
 
 
 
 
 
 
 
ab06da5
cccf104
24784db
 
9bd01e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b3ee3e9
 
9bd01e7
 
 
b3ee3e9
 
 
 
 
 
 
9bd01e7
b3ee3e9
cc4c179
9bd01e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c9c78dd
9bd01e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b3ee3e9
 
 
 
cc4c179
9bd01e7
b3ee3e9
9bd01e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c9c78dd
9bd01e7
 
 
 
cc4c179
9bd01e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cc4c179
b3ee3e9
 
9bd01e7
b3ee3e9
9bd01e7
b3ee3e9
 
f421e19
b3ee3e9
 
 
9bd01e7
 
b3ee3e9
9bd01e7
b3ee3e9
d06a013
b3ee3e9
 
24784db
 
 
d06a013
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
---
title: Benchlm
colorFrom: gray
colorTo: indigo
sdk: static
pinned: false
license: apache-2.0
short_description: llm benchmarks
---
```
Independent LLM benchmarks for a wide range of models using custom prompts including
category and discipline summaries.

Tests are run using a modified llama.cpp server (supporting logprob completion mode) and/or textsynth server where noted.

METHODOLOGY:
   All CoT, code, and math tests are zero shot.  A few BBH tests use fewshot examples.
   Math CoT test such as GSM8K, APPLE, MATH etc. are self graded against correct answer using LLM under test
     If self grade does not work reliably (such as with very small model) the result is zeroed to mark invalid test.
   All MC tests do two queries, 1 with answers in test order and 2nd with answers circularly shifted 1.
     To score a correct answer in MC both queries must answer correctly.
   Winogrande using logprob completion (evaluates the probability of a common completion for the two possible cases).

TESTS:
   KNOWLEDGE:
      TQA - Truthful QA
      JEOPARDY - 100 Question JEOPARDY quiz
   LANGUAGE:
      LAMBADA - Language Modeling Broadened to Account for Discourse Aspects
   UNDERSTANDING:
      WG - Winogrande
      BOOLQ - Boolean questions
      STORYCLOZE - Story questions
      OBQA - Open Book Question / Answer
      SIQA - Social IQ
      RACE - Reading comprehension dataset from examinations
      MMLU - massive multitask language understanding
      MEDQA - medical QA
   REASONING
      CSQA - Common Sense Question Answer
      COPA - Choice of Plausible Alternatives
      HELLASWAG - Hella Situations with Adversarial Generations
      PIQA - Physical Interaction: Question Answering
      ARC - A12 Reasoning Challenge
      AGIEVAL - AGIEval logiqa, lsat, sat
      AGIEVALC  - Gaokao SAT, logiqa, jec (Chinese)
      MUSR - Multimodal Semantic Reasoning
   COT:
      GSM8K - Grade School Math CoT
      BBH  - Beyond the Imitation Game Bench Hard CoT
      MMLUPRO - massive multitask language understanding pro CoT
      AGIEVAL - satmath, aquarat
      AGIEVALC  - mathcloze, mathqa (Chinese)
      MUSR - Multimodal Semantic Reasoning
      APPLE - 100 custom Apple Questions
   MATH:
      MATH1..MATH5 - MATH Datasets level 1 through 5 (Hendrycks et al.)
   CODE:
      HUMANEVAL - Python
      HUMANEVALP - Python, extended test
      HUMANEVALX - Python, Java, Javascript, C++
      MBPP - Python
      MBPPP - Python, extendend test
      CRUXEVAL - Python
      USE {TEST}FIM FOR FIM TEST, i.e. HUMANEVAL->HUMANEVALFIM
```

GENERAL MODELS:

 TEST					| EXAONE-3.5-2.4B-Instruct | EXAONE-3.5-7.8B-Instruct | Falcon3-1B-Instruct | Falcon3-7B-Instruct | Falcon3-10B-Instruct | gemma-2-2b-it | gemma-2-9b-it | gemma-2-9b-it | gemma-2-9b-it | gemma-2-27b-it | glm-4-9b-chat | glm-4-9b-chat | granite-3.0-2b-instruct | granite-3.0-8b-instruct | granite-3.1-1b-a400m-instruct | granite-3.1-2b-instruct | granite-3.1-8b-instruct | internlm2_5-7b-chat | Meta-Llama-3-8B-Instruct | Llama-3.1-8B-Instruct | Llama-3.1-8B-Instruct | Llama-3.1-8B-Instruct | Llama-3.1-8B-Instruct | Llama-3.2-1B-Instruct | Llama-3.2-3B-Instruct | Marco-o1 | Ministral-8B-Instruct-2410 | Mistral-7B-Instruct-v0.3 | Mistral-Nemo-12B-Instruct-2407 | openchat-3.5-0106 | openchat-3.6-8b-20240522 | Phi-3-mini-4k-instruct | Phi-3-mini-128k-instruct | Phi-3-mini-128k-instruct | Phi-3.5-mini-8k-instruct | Phi-3.5-mini-128k-instruct | Phi-3-medium-128k-instruct | Phi-4 | Qwen2-7B-Instruct | Qwen2-7B-Instruct | Qwen2.5-3B-32k-Instruct | Qwen2.5-3B-32k-Instruct | Qwen2.5-7B-32k-Instruct | Qwen2.5-7B-32k-Instruct | Qwen2.5-14B-32k-Instruct | Qwen2.5-32B-Instruct | QwQ-32B-Preview | SOLAR-10.7B-Instruct-v1.0 | solar-pro-preview-instruct |
---------------------------------------------|--------------------------|--------------------------|---------------------|---------------------|----------------------|---------------|---------------|---------------|---------------|----------------|---------------|---------------|-------------------------|-------------------------|-------------------------------|-------------------------|-------------------------|---------------------|--------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|----------|----------------------------|--------------------------|--------------------------------|-------------------|--------------------------|------------------------|--------------------------|--------------------------|--------------------------|----------------------------|----------------------------|-------|-------------------|-------------------|-------------------------|-------------------------|-------------------------|-------------------------|--------------------------|----------------------|-----------------|---------------------------|----------------------------|
 params					| 2.67B                    | 7.82B                    | 1.67B               | 7.46B               | 10.31B               | 2.61B         | 9.24B         | 9.24B         | 9.24B         | 27.23B         | 9.40B         | 9.40B         | 2.63B                   | 8.17B                   | 1.33B                         | 2.53B                   | 8.17B                   | 7.74B               | 8.03B                    | 8.03B                 | 8.03B                 | 8.03B                 | 8.03B                 | 1.24B                 | 3.21B                 | 7.62B    | 8.02B                      | 7.25B                    | 12.25B                         | 7.24B             | 8.03B                    | 3.82B                  | 3.82B                    | 3.82B                    | 3.82B                    | 3.82B                      | 13.96B                     | 14.66B| 7.62B             | 7.62B             | 3.09B                   | 3.09B                   | 7.62B                   | 7.62B                   | 14.77B                   | 32.76B               | 32.76B          | 10.73B                    | 22.14B                     |
 quant					| IQ4_XS                   | Q6_K                     | IQ4_XS              | Q6_K                | IQ4_XS               | Q8_0          | IQ4_XS        | Q4_K_M        | Q6_K          | IQ4_XS         | IQ4_XS        | Q6_K          | Q6_K                    | Q6_K                    | Q6_K                          | Q6_K                    | IQ4_XS                  | IQ4_XS              | Q6_K                     | Q4                    | IQ4_XS                | Q4_K_M                | Q6_K                  | IQ4_XS                | Q6_K                  | IQ4_XS   | Q6_K                       | Q8_0                     | IQ4_XS                         | Q8_0              | Q8_0                     | Q8_0                   | IQ4_XS                   | Q6_K                     | Q6_K                     | Q6_K                       | IQ4_XS                     | IQ4_XS| Q6_K              | Q4                | IQ4_XS                  | Q6_K                    | IQ4_XS                  | Q6_K                    | IQ4_XS                   | IQ4_XS               | IQ4_XS          | Q4_K_M                    | IQ4_XS                     |
 engine					| llama.cpp version: 4384  | llama.cpp version: 4291  | llama.cpp version: 4341 | llama.cpp version: 4341 | llama.cpp version: 4341 | llama.cpp version: 3496| llama.cpp version: 3334| llama.cpp version: 3325| llama.cpp version: 3266| llama.cpp version: 3389| llama.cpp version: 3496| llama.cpp version: 3334| llama.cpp version: 3985 | llama.cpp version: 3985 | llama.cpp version: 4384       | llama.cpp version: 4341 | llama.cpp version: 4363 | llama.cpp version: 3496| llama.cpp version: 3266  | textsynth ts_server version 2024-09-30| llama.cpp version: 3707| llama.cpp version: 3731| llama.cpp version: 3428| llama.cpp version: 4341 | llama.cpp version: 3825| llama.cpp version: 4240 | llama.cpp version: 3927    | llama.cpp version: 3262  | llama.cpp version: 3428        | llama.cpp version: 3262| llama.cpp version: 3262  | llama.cpp version: 3520| llama.cpp version: 3565  | llama.cpp version: 3520  | llama.cpp version: 3609  | llama.cpp version: 3600    | llama.cpp version: 3505    | llama.cpp version: 4295 | llama.cpp version: 3609| textsynth ts_server version 2024-09-30| llama.cpp version: 4038 | llama.cpp version: 4038 | llama.cpp version: 3943 | llama.cpp version: 3870 | llama.cpp version: 3821  | llama.cpp version: 3821| llama.cpp version: 4273 | llama.cpp version: 3235   | llama.cpp version: 3790    |
---------------------------------------------|--------------------------|--------------------------|---------------------|---------------------|----------------------|---------------|---------------|---------------|---------------|----------------|---------------|---------------|-------------------------|-------------------------|-------------------------------|-------------------------|-------------------------|---------------------|--------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|----------|----------------------------|--------------------------|--------------------------------|-------------------|--------------------------|------------------------|--------------------------|--------------------------|--------------------------|----------------------------|----------------------------|-------|-------------------|-------------------|-------------------------|-------------------------|-------------------------|-------------------------|--------------------------|----------------------|-----------------|---------------------------|----------------------------|
 WG                                          | 0.636                    | 0.696                    | 0.600               | 0.670               | 0.700                | 0.701         | 0.756         | 0.761         | 0.762         | 0.772          | 0.759         | 0.753         | 0.679                   | 0.719                   | 0.609                         | 0.700                   | 0.752                   | 0.822               | 0.707                    | 0.740                 | 0.750                 | 0.745                 | 0.741                 | 0.612                 | 0.685                 | 0.695    | 0.748                      | 0.751                    | 0.770                          | 0.783             | 0.760                    | 0.737                  | 0.728                    | 0.727                    | 0.744                    | 0.734                      | 0.744                      | 0.708 | 0.705             | 0.681             | 0.687                   | 0.695                   | 0.709                   | 0.709                   | 0.754                    | 0.746                | 0.750           | 0.759                     | 0.779                      |
 LAMBADA                                     | 0.613                    | 0.680                    | 0.524               | 0.688               | 0.692                | 0.624         | 0.733         | 0.732         | 0.735         | 0.755          | 0.786         | 0.783         | 0.746                   | 0.799                   | 0.665                         | 0.732                   | 0.790                   | 0.732               | 0.710                    | 0.729                 | 0.740                 | 0.738                 | 0.747                 | 0.610                 | 0.705                 | 0.715    | 0.776                      | 0.766                    | 0.714                          | 0.744             | 0.733                    | 0.613                  | 0.638                    | 0.618                    | 0.677                    | 0.613                      | 0.632                      | 0.750 | 0.735             | 0.564             | 0.685                   | 0.682                   | 0.722                   | 0.724                   | 0.769                    | 0.781                | 0.780           | 0.654                     | 0.708                      |
 HELLASWAG                                   | 0.646                    | 0.788                    | 0.308               | 0.684               | 0.716                | 0.496         | 0.766         | 0.798         | 0.775         | 0.810          | 0.834         | 0.840         | 0.583                   | 0.696                   | 0.053                         | 0.517                   | 0.693                   | 0.916               | 0.667                    | 0.739                 | 0.684                 | 0.696                 | 0.696                 | 0.293                 | 0.559                 | 0.809    | 0.835                      | 0.591                    | 0.726                          | 0.800             | 0.824                    | 0.738                  | 0.695                    | 0.743                    | 0.716                    | 0.669                      | 0.807                      | 0.801 | 0.755             | 0.765             | 0.670                   | 0.713                   | 0.820                   | 0.822                   | 0.863                    | 0.894                | 0.875           | 0.826                     | 0.823                      |
 BOOLQ                                       | 0.533                    | 0.582                    | 0.364               | 0.591               | 0.621                | 0.542         | 0.684         | 0.701         | 0.687         | 0.739          | 0.633         | 0.625         | 0.540                   | 0.594                   | 0.558                         | 0.204                   | 0.576                   | 0.572               | 0.609                    | 0.621                 | 0.587                 | 0.576                 | 0.610                 | 0.023                 | 0.478                 | 0.624    | 0.574                      | 0.658                    | 0.677                          | 0.719             | 0.677                    | 0.577                  | 0.562                    | 0.540                    | 0.562                    | 0.573                      | 0.635                      | 0.653 | 0.633             | 0.575             | 0.517                   | 0.533                   | 0.617                   | 0.623                   | 0.647                    | 0.701                | 0.629           | 0.651                     | 0.665                      |
 STORYCLOZE                                  | 0.922                    | 0.950                    | 0.774               | 0.949               | 0.947                | 0.877         | 0.948         | 0.959         | 0.958         | 0.973          | 0.967         | 0.976         | 0.925                   | 0.924                   | 0.440                         | 0.864                   | 0.955                   | 0.996               | 0.930                    | 0.685                 | 0.884                 | 0.859                 | 0.895                 | 0.421                 | 0.870                 | 0.895    | 0.937                      | 0.917                    | 0.921                          | 0.982             | 0.980                    | 0.907                  | 0.906                    | 0.891                    | 0.531                    | 0.921                      | 0.969                      | 0.754 | 0.959             | 0.964             | 0.913                   | 0.896                   | 0.920                   | 0.915                   | 0.938                    | 0.981                | 0.964           | 0.973                     | 0.949                      |
 CSQA                                        | 0.647                    | 0.740                    | 0.488               | 0.725               | 0.746                | 0.631         | 0.741         | 0.751         | 0.751         | 0.763          | 0.727         | 0.733         | 0.615                   | 0.692                   | 0.098                         | 0.587                   | 0.686                   | 0.760               | 0.639                    | 0.732                 | 0.698                 | 0.678                 | 0.686                 | 0.404                 | 0.642                 | 0.760    | 0.669                      | 0.627                    | 0.664                          | 0.796             | 0.875                    | 0.679                  | 0.656                    | 0.675                    | 0.669                    | 0.660                      | 0.751                      | 0.740 | 0.728             | 0.726             | 0.701                   | 0.717                   | 0.768                   | 0.781                   | 0.795                    | 0.823                | 0.796           | 0.714                     | 0.726                      |
 OBQA                                        | 0.660                    | 0.773                    | 0.380               | 0.761               | 0.745                | 0.647         | 0.841         | 0.846         | 0.846         | 0.860          | 0.821         | 0.802         | 0.605                   | 0.714                   | 0.119                         | 0.575                   | 0.719                   | 0.818               | 0.685                    | 0.787                 | 0.738                 | 0.753                 | 0.765                 | 0.394                 | 0.709                 | 0.800    | 0.769                      | 0.676                    | 0.730                          | 0.834             | 0.803                    | 0.773                  | 0.750                    | 0.761                    | 0.751                    | 0.720                      | 0.830                      | 0.857 | 0.800             | 0.771             | 0.700                   | 0.731                   | 0.802                   | 0.804                   | 0.863                    | 0.904                | 0.882           | 0.794                     | 0.812                      |
 COPA                                        | 0.842                    | 0.907                    | 0.612               | 0.870               | 0.903                | 0.823         | 0.923         | 0.926         | 0.925         | 0.949          | 0.955         | 0.944         | 0.806                   | 0.844                   | 0.222                         | 0.809                   | 0.884                   | 0.927               | 0.886                    | 0.908                 | 0.878                 | 0.859                 | 0.889                 | 0.557                 | 0.749                 | 0.922    | 0.887                      | 0.812                    | 0.859                          | 0.967             | 0.947                    | 0.890                  | 0.898                    | 0.884                    | 0.884                    | 0.870                      | 0.924                      | 0.934 | 0.923             | 0.912             | 0.841                   | 0.858                   | 0.925                   | 0.919                   | 0.935                    | 0.958                | 0.936           | 0.910                     | 0.940                      |
 PIQA                                        | 0.624                    | 0.722                    | 0.233               | 0.696               | 0.732                | 0.591         | 0.799         | 0.803         | 0.801         | 0.841          | 0.773         | 0.779         | 0.497                   | 0.712                   | 0.171                         | 0.593                   | 0.720                   | 0.787               | 0.681                    | 0.819                 | 0.707                 | 0.716                 | 0.725                 | 0.342                 | 0.637                 | 0.769    | 0.694                      | 0.708                    | 0.777                          | 0.794             | 0.771                    | 0.761                  | 0.745                    | 0.741                    | 0.733                    | 0.677                      | 0.827                      | 0.832 | 0.784             | 0.778             | 0.695                   | 0.713                   | 0.794                   | 0.807                   | 0.848                    | 0.870                | 0.829           | 0.799                     | 0.803                      |
 SIQA                                        | 0.667                    | 0.697                    | 0.425               | 0.658               | 0.688                | 0.597         | 0.691         | 0.694         | 0.693         | 0.731          | 0.664         | 0.665         | 0.627                   | 0.678                   | 0.139                         | 0.592                   | 0.684                   | 0.735               | 0.624                    | 0.712                 | 0.634                 | 0.643                 | 0.648                 | 0.374                 | 0.622                 | 0.713    | 0.684                      | 0.620                    | 0.655                          | 0.730             | 0.726                    | 0.675                  | 0.662                    | 0.675                    | 0.667                    | 0.661                      | 0.729                      | 0.639 | 0.699             | 0.696             | 0.656                   | 0.663                   | 0.721                   | 0.712                   | 0.746                    | 0.742                | 0.714           | 0.727                     | 0.716                      |
 MEDQA                                       | 0.262                    | 0.374                    | 0.141               | 0.420               | 0.430                | 0.286         | 0.492         | 0.498         | 0.501         | 0.549          | 0.436         | 0.445         | 0.242                   | 0.336                   | 0.032                         | 0.223                   | 0.359                   | 0.389               | 0.486                    | 0.556                 | 0.491                 | 0.482                 | 0.500                 | 0.150                 | 0.413                 | 0.422    | 0.399                      | 0.334                    | 0.465                          | 0.372             | 0.427                    | 0.457                  | 0.421                    | 0.446                    | 0.423                    | 0.395                      | 0.553                      | 0.560 | 0.391             | 0.380             | 0.344                   | 0.363                   | 0.453                   | 0.458                   | 0.542                    | 0.610                | 0.598           | 0.368                     | 0.538                      |
 JEOPARDY                                    | 0.170                    | 0.420                    | 0.010               | 0.400               | 0.310                | 0.220         | 0.510         | 0.570         | 0.550         | 0.740          | 0.370         | 0.420         | 0.230                   | 0.470                   | 0.960                         | 0                       | 0.400                   | 0.300               | 0.370                    | 0.450                 | 0.500                 | 0.390                 | 0.510                 | 0                     | 0.350                 | 0.250    | 0.330                      | 0.490                    | 0.470                          | 0.450             | 0.510                    | 0.300                  | 0.220                    | 0.250                    | 0.320                    | 0.250                      | 0.460                      | 0.550 | 0.200             | 0.250             | 0.120                   | 0.120                   | 0.300                   | 0.290                   | 0.540                    | 0.600                | 0.600           | 0.480                     | 0.480                      |
 GSM8K                                       | 0.814                    | 0.902                    | 0.485               | 0.890               | 0.918                | 0.645         | 0.878         | 0.881         | 0.890         | 0.899          | 0.855         | 0.839         | 0.712                   | 0.811                   | 0.416                         | 0.741                   | 0.843                   | 0.855               | 0.817                    | 0.851                 | 0.859                 | 0.859                 | 0.872                 | 0.490                 | 0.822                 | 0.912    | 0.887                      | 0.611                    | 0.828                          | 0.775             | 0.811                    | 0.870                  | 0.706                    | 0.833                    | 0.855                    | 0.714                      | 0.667                      | 0.946 | 0.871             | 0.535             | 0.829                   | 0.856                   | 0.909                   | 0.917                   | 0.938                    | 0.950                | 0.962           | 0.730                     | 0.812                      |
 APPLE                                       | 0.540                    | 0.720                    | 0.150               | 0.810               | 0.740                | 0.350         | 0.700         | 0.690         | 0.750         | 0.730          | 0.630         | 0.610         | 0.370                   | 0.590                   | 0.150                         | 0.410                   | 0.560                   | 0.650               | 0.580                    |  -                    | 0.600                 | 0.690                 | 0.690                 | 0.230                 | 0.610                 | 0.710    | 0.760                      | 0.390                    | 0.690                          | 0.500             | 0.520                    | 0.610                  | 0.480                    | 0.540                    | 0.560                    | 0.560                      | 0.650                      | 0.910 | 0.630             |  -                | 0.640                   | 0.560                   | 0.740                   | 0.750                   | 0.830                    | 0.860                | 0.870           | 0.510                     | 0.650                      |
 HUMANEVAL                                   | 0.725                    | 0.804                    | 0.115               | 0.737               | 0.774                | 0.408         | 0.646         | 0.621         | 0.658         | 0.743          | 0.737         | 0.731         | 0.469                   | 0.621                   | 0.359                         | 0.536                   | 0.689                   | 0.317               | 0.591                    | 0.652                 | 0.634                 | 0.628                 | 0.652                 | 0.298                 | 0.585                 | 0.810    | 0.768                      | 0.390                    | 0.689                          | 0.310             | 0.689                    | 0.707                  | 0.628                    | 0.652                    | 0.682                    | 0.621                      | 0.268                      | 0.847 | 0.719             | 0.567             | 0.695                   | 0.780                   | 0.798                   | 0.817                   | 0.804                    | 0.884                | 0.414           | 0.402                     | 0.506                      |
 HUMANEVALP                                  | 0.646                    | 0.701                    | 0.073               | 0.628               | 0.664                | 0.310         | 0.548         | 0.530         | 0.548         | 0.615          | 0.615         | 0.634         | 0.402                   | 0.536                   | 0.286                         | 0.469                   | 0.591                   | 0.250               | 0.335                    | 0.524                 | 0.518                 | 0.524                 | 0.536                 | 0.225                 | 0.475                 | 0.701    | 0.628                      | 0.329                    | 0.554                          | 0.304             | 0.579                    | 0.603                  | 0.524                    | 0.567                    | 0.591                    | 0.524                      | 0.219                      | 0.725 | 0.609             | 0.475             | 0.615                   | 0.682                   | 0.670                   | 0.658                   | 0.676                    | 0.768                | 0.359           | 0.286                     | 0.432                      |
 MBPP                                        | 0.548                    | 0.618                    | 0.334               | 0.669               | 0.653                | 0.459         | 0.591         | 0.595         | 0.595         | 0.642          | 0.579         | 0.591         | 0.470                   | 0.548                   | 0.424                         | 0.513                   | 0.591                   | 0.521               | 0.050                    | 0.544                 | 0.575                 | 0.583                 | 0.564                 | 0.381                 | 0.498                 | 0.642    | 0.626                      | 0.451                    | 0.513                          | 0.470             | 0.326                    | 0.536                  | 0.482                    | 0.451                    | 0.610                    | 0.498                      | 0.412                      | 0.673 | 0.587             | 0.501             | 0.595                   | 0.599                   | 0.669                   | 0.661                   | 0.669                    | 0.684                | 0.404           | 0.373                     | 0.330                      |
 MBPPP                                       | 0.508                    | 0.575                    | 0.312               | 0.625               | 0.611                | 0.441         | 0.575         | 0.575         | 0.584         | 0.638          | 0.562         | 0.575         | 0.433                   | 0.517                   | 0.392                         | 0.504                   | 0.522                   | 0.477               | 0.049                    | 0.526                 | 0.526                 | 0.535                 | 0.540                 | 0.397                 | 0.482                 | 0.611    | 0.580                      | 0.397                    | 0.428                          | 0.410             | 0.151                    | 0.482                  | 0.459                    | 0.450                    | 0.575                    | 0.477                      | 0.401                      | 0.651 | 0.580             | 0.495             | 0.540                   | 0.584                   | 0.633                   | 0.651                   | 0.633                    | 0.700                | 0.392           | 0.366                     | 0.321                      |
 HUMANEVALFIM                                |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             |  -            |  -            |  -                      |  -                      |  -                            |  -                      |  -                      |  -                  |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    |  -       |  -                         |  -                       | 0.512                          |  -                |  -                       |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    |  -                |  -                |  -                      |  -                      |  -                      |  -                      |  -                       |  -                   |  -              |  -                        |  -                         |
 HUMANEVALX_cpp                              | 0.420                    | 0.597                    | 0.054               | 0.506               | 0.603                | 0.152         | 0.500         | 0.500         | 0.512         | 0.579          | 0.439         | 0.432         | 0.231                   | 0.347                   | 0.213                         | 0.268                   | 0.408                   | 0.231               | 0.317                    | 0.256                 | 0.420                 | 0.445                 | 0.457                 | 0.158                 | 0.323                 | 0.310    | 0.573                      | 0.225                    | 0.292                          | 0.347             | 0.060                    | 0.262                  | 0.243                    | 0.243                    | 0.280                    | 0.219                      | 0                          | 0.676 | 0.542             | 0.384             | 0.420                   | 0.237                   | 0.475                   | 0.554                   | 0.323                    | 0.701                | 0.378           | 0.237                     | 0.219                      |
 HUMANEVALX_java                             | 0.536                    | 0.689                    | 0.042               | 0.640               | 0.719                | 0.390         | 0.628         | 0.640         | 0.640         | 0.768          | 0.207         | 0.628         | 0.353                   | 0.365                   | 0.231                         | 0.390                   | 0.518                   | 0.170               | 0.079                    | 0.036                 | 0.396                 | 0.097                 | 0.487                 | 0                     | 0.439                 | 0.634    | 0.731                      | 0.256                    | 0.201                          | 0.170             | 0.073                    | 0.493                  | 0.030                    | 0.024                    | 0.079                    | 0.060                      | 0                          | 0.634 | 0.628             | 0.365             | 0.640                   | 0.615                   | 0.695                   | 0.737                   | 0.780                    | 0.865                | 0.097           | 0.347                     | 0                          |
 HUMANEVALX_js                               | 0.573                    | 0.731                    | 0.115               | 0.676               | 0.652                | 0.353         | 0.548         | 0.567         | 0.579         | 0.743          | 0.628         | 0.628         | 0.426                   | 0.560                   | 0.213                         | 0.451                   | 0.573                   | 0.518               | 0.079                    | 0.548                 | 0.542                 | 0.548                 | 0.560                 | 0.243                 | 0.067                 | 0.750    | 0.701                      | 0.402                    | 0.615                          | 0.170             | 0.018                    | 0.567                  | 0.378                    | 0.524                    | 0.560                    | 0.451                      | 0.219                      | 0.786 | 0.731             | 0.560             | 0.646                   | 0.689                   | 0.719                   | 0.750                   | 0.798                    | 0.847                | 0.493           | 0.048                     | 0.359                      |
 HUMANEVALX                                  | 0.510                    | 0.672                    | 0.071               | 0.607               | 0.658                | 0.298         | 0.558         | 0.569         | 0.577         | 0.697          | 0.424         | 0.563         | 0.337                   | 0.424                   | 0.219                         | 0.369                   | 0.500                   | 0.306               | 0.158                    | 0.280                 | 0.453                 | 0.363                 | 0.502                 | 0.134                 | 0.276                 | 0.565    | 0.668                      | 0.294                    | 0.369                          | 0.229             | 0.050                    | 0.441                  | 0.217                    | 0.264                    | 0.306                    | 0.243                      | 0.073                      | 0.699 | 0.634             | 0.436             | 0.569                   | 0.514                   | 0.630                   | 0.680                   | 0.634                    | 0.804                | 0.323           | 0.211                     | 0.193                      |
 CRUXEVAL_input                              | 0.333                    | 0.377                    | 0.210               | 0.411               | 0.448                | 0.321         | 0.455         | 0.443         | 0.462         | 0.485          | 0.416         | 0.406         | 0.288                   | 0.317                   | 0.077                         | 0.298                   | 0.448                   | 0.367               | 0.340                    | 0.405                 | 0.408                 | 0.440                 | 0.435                 | 0.162                 | 0.353                 | 0.367    | 0.428                      | 0.276                    | 0.442                          | 0.323             | 0.383                    | 0.372                  | 0.375                    | 0.390                    | 0.398                    | 0.388                      | 0.456                      | 0.447 | 0.351             | 0.298             | 0.350                   | 0.331                   | 0.387                   | 0.412                   | 0.541                    | 0.517                | 0.200           | 0.131                     | 0.438                      |
 CRUXEVAL_output                             | 0.262                    | 0.348                    | 0.152               | 0.355               | 0.410                | 0.280         | 0.373         | 0.372         | 0.375         | 0.482          | 0.356         | 0.338         | 0.282                   | 0.351                   | 0.171                         | 0.253                   | 0.336                   | 0.276               | 0.318                    | 0.352                 | 0.341                 | 0.356                 | 0.360                 | 0.201                 | 0.291                 | 0.403    | 0.377                      | 0.303                    | 0.365                          | 0.318             | 0.323                    | 0.340                  | 0.321                    | 0.340                    | 0.342                    | 0.296                      | 0.423                      | 0.463 | 0.050             | 0.175             | 0.275                   | 0.311                   | 0.382                   | 0.386                   | 0.471                    | 0.455                | 0.368           | 0.222                     | 0.388                      |
 CRUXEVAL                                    | 0.298                    | 0.363                    | 0.181               | 0.383               | 0.429                | 0.300         | 0.414         | 0.408         | 0.418         | 0.483          | 0.386         | 0.372         | 0.285                   | 0.334                   | 0.124                         | 0.276                   | 0.392                   | 0.321               | 0.329                    | 0.378                 | 0.375                 | 0.398                 | 0.397                 | 0.181                 | 0.322                 | 0.385    | 0.403                      | 0.290                    | 0.403                          | 0.321             | 0.353                    | 0.356                  | 0.348                    | 0.365                    | 0.370                    | 0.342                      | 0.440                      | 0.455 | 0.200             | 0.236             | 0.312                   | 0.321                   | 0.385                   | 0.399                   | 0.506                    | 0.486                | 0.284           | 0.176                     | 0.413                      |
 CRUXEVALFIM_input                           |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             |  -            |  -            |  -                      |  -                      |  -                            |  -                      |  -                      |  -                  |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    |  -       |  -                         |  -                       | 0.195                          |  -                |  -                       |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    |  -                |  -                |  -                      |  -                      |  -                      |  -                      |  -                       |  -                   |  -              |  -                        |  -                         |
 CRUXEVALFIM_output                          |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             |  -            |  -            |  -                      |  -                      |  -                            |  -                      |  -                      |  -                  |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    |  -       |  -                         |  -                       | 0.026                          |  -                |  -                       |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    |  -                |  -                |  -                      |  -                      |  -                      |  -                      |  -                       |  -                   |  -              |  -                        |  -                         |
 CRUXEVALFIM                                 |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             |  -            |  -            |  -                      |  -                      |  -                            |  -                      |  -                      |  -                  |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    |  -       |  -                         |  -                       | 0.110                          |  -                |  -                       |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    |  -                |  -                |  -                      |  -                      |  -                      |  -                      |  -                       |  -                   |  -              |  -                        |  -                         |
 TQA_mc                                      | 0.537                    | 0.674                    | 0.146               | 0.523               | 0.510                | 0.356         | 0.679         | 0.696         | 0.701         | 0.767          | 0.636         | 0.640         | 0.402                   | 0.559                   | 0.072                         | 0.376                   | 0.576                   | 0.648               | 0.507                    | 0.629                 | 0.626                 | 0.630                 | 0.564                 | 0.274                 | 0.555                 | 0.583    | 0.561                      | 0.549                    | 0.542                          | 0.624             | 0.547                    | 0.653                  | 0.631                    | 0.643                    | 0.621                    | 0.581                      | 0.742                      | 0.725 | 0.503             | 0.532             | 0.516                   | 0.548                   | 0.654                   | 0.657                   | 0.747                    | 0.804                | 0.795           | 0.563                     | 0.709                      |
 TQA_tf                                      | 0.510                    | 0.647                    | 0.381               | 0.410               | 0.431                | 0.510         | 0.675         | 0.719         | 0.692         | 0.725          | 0.484         | 0.457         | 0.421                   | 0.473                   | 0.250                         | 0.510                   | 0.343                   | 0.593               | 0.504                    | 0.643                 | 0.552                 | 0.563                 | 0.512                 | 0.063                 | 0.566                 | 0.569    | 0.572                      | 0.548                    | 0.479                          | 0.548             | 0.536                    | 0.578                  | 0.532                    | 0.541                    | 0.483                    | 0.487                      | 0.670                      | 0.686 | 0.435             | 0.602             | 0.414                   | 0.300                   | 0.574                   | 0.568                   | 0.706                    | 0.731                | 0.523           | 0.487                     | 0.485                      |
 TQA                                         | 0.514                    | 0.650                    | 0.354               | 0.423               | 0.440                | 0.492         | 0.675         | 0.716         | 0.693         | 0.730          | 0.502         | 0.478         | 0.419                   | 0.483                   | 0.229                         | 0.495                   | 0.370                   | 0.599               | 0.505                    | 0.641                 | 0.560                 | 0.571                 | 0.518                 | 0.088                 | 0.565                 | 0.571    | 0.571                      | 0.548                    | 0.486                          | 0.556             | 0.537                    | 0.587                  | 0.544                    | 0.553                    | 0.499                    | 0.498                      | 0.679                      | 0.691 | 0.442             | 0.594             | 0.426                   | 0.329                   | 0.583                   | 0.578                   | 0.711                    | 0.740                | 0.554           | 0.496                     | 0.511                      |
 ARC_challenge                               | 0.709                    | 0.825                    | 0.374               | 0.809               | 0.819                | 0.671         | 0.874         | 0.881         | 0.882         | 0.897          | 0.835         | 0.853         | 0.614                   | 0.742                   | 0.063                         | 0.599                   | 0.744                   | 0.812               | 0.732                    | 0.794                 | 0.769                 | 0.766                 | 0.776                 | 0.342                 | 0.706                 | 0.825    | 0.775                      | 0.688                    | 0.773                          | 0.797             | 0.819                    | 0.838                  | 0.808                    | 0.833                    | 0.813                    | 0.802                      | 0.888                      | 0.911 | 0.818             | 0.796             | 0.750                   | 0.777                   | 0.843                   | 0.851                   | 0.911                    | 0.934                | 0.917           | 0.755                     | 0.871                      |
 ARC_easy                                    | 0.865                    | 0.937                    | 0.598               | 0.925               | 0.933                | 0.846         | 0.949         | 0.950         | 0.952         | 0.963          | 0.933         | 0.940         | 0.808                   | 0.893                   | 0.105                         | 0.783                   | 0.887                   | 0.936               | 0.883                    | 0.914                 | 0.898                 | 0.904                 | 0.906                 | 0.576                 | 0.843                 | 0.937    | 0.910                      | 0.843                    | 0.908                          | 0.910             | 0.914                    | 0.943                  | 0.926                    | 0.935                    | 0.934                    | 0.932                      | 0.965                      | 0.970 | 0.923             | 0.918             | 0.895                   | 0.904                   | 0.945                   | 0.946                   | 0.969                    | 0.978                | 0.975           | 0.899                     | 0.962                      |
 ARC                                         | 0.813                    | 0.900                    | 0.524               | 0.886               | 0.895                | 0.788         | 0.925         | 0.927         | 0.929         | 0.941          | 0.901         | 0.911         | 0.744                   | 0.843                   | 0.091                         | 0.722                   | 0.839                   | 0.895               | 0.833                    | 0.874                 | 0.855                 | 0.858                 | 0.863                 | 0.498                 | 0.798                 | 0.900    | 0.866                      | 0.792                    | 0.864                          | 0.873             | 0.883                    | 0.908                  | 0.887                    | 0.901                    | 0.894                    | 0.889                      | 0.939                      | 0.950 | 0.888             | 0.878             | 0.847                   | 0.862                   | 0.911                   | 0.915                   | 0.950                    | 0.963                | 0.956           | 0.851                     | 0.932                      |
 RACE_high                                   | 0.626                    | 0.753                    | 0.431               | 0.698               | 0.730                | 0.580         | 0.817         | 0.817         | 0.802         | 0.833          | 0.788         | 0.787         | 0.553                   | 0.642                   | 0.066                         | 0.469                   | 0.631                   | 0.826               | 0.641                    | 0.756                 | 0.679                 | 0.676                 | 0.679                 | 0.377                 | 0.589                 | 0.771    | 0.736                      | 0.607                    | 0.726                          | 0.771             | 0.773                    | 0.696                  | 0.625                    | 0.648                    | 0.613                    | 0.625                      | 0.779                      | 0.819 | 0.779             | 0.762             | 0.698                   | 0.712                   | 0.779                   | 0.788                   | 0.852                    | 0.882                | 0.871           | 0.741                     | 0.764                      |
 RACE_middle                                 | 0.704                    | 0.806                    | 0.463               | 0.777               | 0.793                | 0.610         | 0.860         | 0.860         | 0.849         | 0.883          | 0.816         | 0.825         | 0.631                   | 0.735                   | 0.089                         | 0.579                   | 0.706                   | 0.863               | 0.708                    | 0.817                 | 0.747                 | 0.744                 | 0.734                 | 0.396                 | 0.680                 | 0.825    | 0.800                      | 0.696                    | 0.782                          | 0.807             | 0.834                    | 0.750                  | 0.697                    | 0.722                    | 0.706                    | 0.692                      | 0.832                      | 0.861 | 0.827             | 0.811             | 0.775                   | 0.776                   | 0.841                   | 0.853                   | 0.887                    | 0.923                |  -              | 0.809                     | 0.824                      |
 RACE                                        | 0.648                    | 0.769                    | 0.440               | 0.721               | 0.748                | 0.589         | 0.830         | 0.829         | 0.816         | 0.847          | 0.796         | 0.798         | 0.576                   | 0.669                   | 0.072                         | 0.501                   | 0.653                   | 0.837               | 0.660                    | 0.774                 | 0.699                 | 0.696                 | 0.695                 | 0.382                 | 0.615                 | 0.786    | 0.755                      | 0.633                    | 0.743                          | 0.781             | 0.791                    | 0.712                  | 0.646                    | 0.670                    | 0.640                    | 0.645                      | 0.795                      | 0.831 | 0.793             | 0.776             | 0.720                   | 0.730                   | 0.797                   | 0.807                   | 0.862                    | 0.894                | 0.871           | 0.761                     | 0.781                      |
 MMLU
 abstract_algebra                            | 0.220                    | 0.350                    | 0.180               | 0.410               | 0.450                | 0.140         | 0.320         | 0.320         | 0.330         | 0.310          | 0.220         | 0.210         | 0.170                   | 0.200                   | 0.020                         | 0.140                   | 0.270                   | 0.230               | 0.140                    | 0.390                 | 0.210                 | 0.140                 | 0.200                 | 0.140                 | 0.270                 | 0.370    | 0.210                      | 0.190                    | 0.330                          | 0.220             | 0.170                    | 0.340                  | 0.330                    | 0.250                    | 0.300                    | 0.210                      | 0.390                      | 0.410 | 0.480             | 0.370             | 0.240                   | 0.250                   | 0.440                   | 0.430                   | 0.570                    | 0.600                |  -              | 0.140                     | 0.340                      |
 anatomy                                     | 0.370                    | 0.540                    | 0.318               | 0.577               | 0.592                | 0.414         | 0.604         | 0.611         | 0.626         | 0.607          | 0.503         | 0.511         | 0.362                   | 0.474                   | 0.066                         | 0.407                   | 0.511                   | 0.614               | 0.544                    | 0.688                 | 0.540                 | 0.511                 | 0.555                 | 0.348                 | 0.540                 | 0.585    | 0.540                      | 0.447                    | 0.555                          | 0.552             | 0.537                    | 0.562                  | 0.540                    | 0.577                    | 0.570                    | 0.585                      | 0.666                      | 0.703 | 0.488             | 0.488             | 0.525                   | 0.562                   | 0.622                   | 0.622                   | 0.644                    | 0.733                |  -              | 0.477                     | 0.607                      |
 astronomy                                   | 0.513                    | 0.723                    | 0.263               | 0.736               | 0.756                | 0.467         | 0.753         | 0.740         | 0.760         | 0.828          | 0.644         | 0.651         | 0.519                   | 0.651                   | 0.059                         | 0.493                   | 0.631                   | 0.723               | 0.640                    | 0.723                 | 0.651                 | 0.657                 | 0.677                 | 0.421                 | 0.565                 | 0.756    | 0.671                      | 0.573                    | 0.651                          | 0.620             | 0.646                    | 0.710                  | 0.684                    | 0.677                    | 0.703                    | 0.703                      | 0.796                      | 0.776 | 0.723             | 0.651             | 0.618                   | 0.657                   | 0.763                   | 0.769                   | 0.868                    | 0.875                |  -              | 0.586                     | 0.756                      |
 business_ethics                             | 0.550                    | 0.600                    | 0.260               | 0.570               | 0.560                | 0.430         | 0.620         | 0.630         | 0.620         | 0.670          | 0.570         | 0.610         | 0.460                   | 0.490                   | 0.020                         | 0.400                   | 0.530                   | 0.640               | 0.510                    | 0.640                 | 0.540                 | 0.520                 | 0.550                 | 0.280                 | 0.480                 | 0.640    | 0.540                      | 0.520                    | 0.630                          | 0.540             | 0.530                    | 0.620                  | 0.570                    | 0.620                    | 0.620                    | 0.620                      | 0.710                      | 0.740 | 0.670             | 0.620             | 0.630                   | 0.590                   | 0.680                   | 0.710                   | 0.750                    | 0.800                |  -              | 0.570                     | 0.740                      |
 clinical_knowledge                          | 0.528                    | 0.649                    | 0.373               | 0.652               | 0.683                | 0.550         | 0.724         | 0.743         | 0.743         | 0.788          | 0.618         | 0.622         | 0.490                   | 0.584                   | 0.105                         | 0.475                   | 0.656                   | 0.716               | 0.690                    | 0.750                 | 0.656                 | 0.698                 | 0.675                 | 0.298                 | 0.592                 | 0.701    | 0.664                      | 0.581                    | 0.664                          | 0.637             | 0.649                    | 0.716                  | 0.709                    | 0.686                    | 0.713                    | 0.698                      | 0.750                      | 0.781 | 0.686             | 0.671             | 0.633                   | 0.645                   | 0.709                   | 0.713                   | 0.803                    | 0.815                |  -              | 0.577                     | 0.735                      |
 college_biology                             | 0.590                    | 0.715                    | 0.340               | 0.763               | 0.777                | 0.625         | 0.847         | 0.833         | 0.854         | 0.895          | 0.687         | 0.715         | 0.486                   | 0.659                   | 0.013                         | 0.465                   | 0.673                   | 0.701               | 0.652                    | 0.777                 | 0.722                 | 0.687                 | 0.722                 | 0.250                 | 0.625                 | 0.736    | 0.708                      | 0.625                    | 0.694                          | 0.631             | 0.659                    | 0.812                  | 0.756                    | 0.791                    | 0.805                    | 0.763                      | 0.819                      | 0.868 | 0.743             | 0.701             | 0.694                   | 0.694                   | 0.784                   | 0.784                   | 0.854                    | 0.923                |  -              | 0.618                     | 0.833                      |
 college_chemistry                           | 0.320                    | 0.390                    | 0.180               | 0.470               | 0.430                | 0.330         | 0.440         | 0.450         | 0.470         | 0.430          | 0.380         | 0.380         | 0.340                   | 0.350                   | 0.020                         | 0.310                   | 0.380                   | 0.410               | 0.380                    | 0.490                 | 0.390                 | 0.390                 | 0.400                 | 0.160                 | 0.310                 | 0.400    | 0.370                      | 0.350                    | 0.340                          | 0.380             | 0.400                    | 0.440                  | 0.450                    | 0.450                    | 0.460                    | 0.430                      | 0.440                      | 0.520 | 0.380             | 0.340             | 0.310                   | 0.370                   | 0.480                   | 0.490                   | 0.460                    | 0.530                |  -              | 0.330                     | 0.450                      |
 college_computer_science                    | 0.360                    | 0.520                    | 0.110               | 0.540               | 0.590                | 0.290         | 0.480         | 0.460         | 0.460         | 0.580          | 0.470         | 0.480         | 0.250                   | 0.400                   | 0.060                         | 0.260                   | 0.400                   | 0.490               | 0.340                    | 0.500                 | 0.380                 | 0.340                 | 0.400                 | 0.200                 | 0.350                 | 0.550    | 0.410                      | 0.320                    | 0.400                          | 0.440             | 0.400                    | 0.480                  | 0.470                    | 0.440                    | 0.480                    | 0.410                      | 0.510                      | 0.600 | 0.520             | 0.460             | 0.390                   | 0.460                   | 0.620                   | 0.590                   | 0.630                    | 0.720                |  -              | 0.370                     | 0.480                      |
 college_mathematics                         | 0.170                    | 0.240                    | 0.090               | 0.320               | 0.320                | 0.100         | 0.290         | 0.270         | 0.260         | 0.300          | 0.240         | 0.280         | 0.120                   | 0.180                   | 0.030                         | 0.200                   | 0.200                   | 0.230               | 0.200                    | 0.330                 | 0.220                 | 0.200                 | 0.260                 | 0.150                 | 0.210                 | 0.310    | 0.180                      | 0.180                    | 0.180                          | 0.200             | 0.200                    | 0.300                  | 0.270                    | 0.200                    | 0.270                    | 0.170                      | 0.340                      | 0.340 | 0.260             | 0.280             | 0.200                   | 0.180                   | 0.380                   | 0.350                   | 0.490                    | 0.540                |  -              | 0.170                     | 0.310                      |
 college_medicine                            | 0.497                    | 0.618                    | 0.283               | 0.566               | 0.612                | 0.491         | 0.635         | 0.641         | 0.658         | 0.716          | 0.572         | 0.589         | 0.439                   | 0.520                   | 0.034                         | 0.473                   | 0.514                   | 0.606               | 0.543                    | 0.664                 | 0.612                 | 0.624                 | 0.589                 | 0.254                 | 0.491                 | 0.583    | 0.543                      | 0.456                    | 0.572                          | 0.566             | 0.543                    | 0.618                  | 0.572                    | 0.572                    | 0.612                    | 0.566                      | 0.682                      | 0.728 | 0.589             | 0.549             | 0.560                   | 0.606                   | 0.606                   | 0.624                   | 0.710                    | 0.739                |  -              | 0.485                     | 0.653                      |
 college_physics                             | 0.284                    | 0.343                    | 0.186               | 0.372               | 0.411                | 0.235         | 0.401         | 0.382         | 0.352         | 0.421          | 0.313         | 0.323         | 0.215                   | 0.225                   | 0.088                         | 0.215                   | 0.274                   | 0.362               | 0.313                    | 0.450                 | 0.313                 | 0.294                 | 0.313                 | 0.205                 | 0.303                 | 0.333    | 0.264                      | 0.254                    | 0.245                          | 0.205             | 0.303                    | 0.362                  | 0.392                    | 0.352                    | 0.333                    | 0.294                      | 0.372                      | 0.529 | 0.343             | 0.333             | 0.382                   | 0.392                   | 0.401                   | 0.372                   | 0.519                    | 0.656                |  -              | 0.245                     | 0.382                      |
 computer_security                           | 0.590                    | 0.590                    | 0.370               | 0.710               | 0.690                | 0.580         | 0.710         | 0.740         | 0.730         | 0.710          | 0.710         | 0.730         | 0.630                   | 0.690                   | 0.070                         | 0.560                   | 0.670                   | 0.690               | 0.680                    | 0.770                 | 0.700                 | 0.690                 | 0.690                 | 0.360                 | 0.620                 | 0.740    | 0.640                      | 0.600                    | 0.680                          | 0.670             | 0.660                    | 0.700                  | 0.690                    | 0.680                    | 0.700                    | 0.650                      | 0.700                      | 0.730 | 0.650             | 0.670             | 0.650                   | 0.690                   | 0.720                   | 0.710                   | 0.730                    | 0.800                |  -              | 0.610                     | 0.700                      |
 conceptual_physics                          | 0.374                    | 0.570                    | 0.234               | 0.680               | 0.680                | 0.395         | 0.608         | 0.629         | 0.638         | 0.727          | 0.561         | 0.587         | 0.353                   | 0.463                   | 0.029                         | 0.314                   | 0.442                   | 0.612               | 0.404                    | 0.570                 | 0.455                 | 0.476                 | 0.463                 | 0.208                 | 0.361                 | 0.587    | 0.404                      | 0.365                    | 0.446                          | 0.468             | 0.472                    | 0.604                  | 0.519                    | 0.565                    | 0.565                    | 0.553                      | 0.685                      | 0.748 | 0.561             | 0.557             | 0.485                   | 0.519                   | 0.642                   | 0.642                   | 0.800                    | 0.834                |  -              | 0.442                     | 0.693                      |
 econometrics                                | 0.289                    | 0.403                    | 0.122               | 0.649               | 0.587                | 0.271         | 0.566         | 0.566         | 0.557         | 0.587          | 0.456         | 0.464         | 0.228                   | 0.377                   | 0.035                         | 0.219                   | 0.412                   | 0.482               | 0.371                    | 0.552                 | 0.447                 | 0.438                 | 0.482                 | 0.114                 | 0.359                 | 0.535    | 0.333                      | 0.318                    | 0.429                          | 0.362             | 0.424                    | 0.535                  | 0.473                    | 0.456                    | 0.456                    | 0.421                      | 0.543                      | 0.596 | 0.535             | 0.508             | 0.421                   | 0.438                   | 0.605                   | 0.596                   | 0.649                    | 0.675                |  -              | 0.345                     | 0.526                      |
 electrical_engineering                      | 0.406                    | 0.489                    | 0.220               | 0.641               | 0.648                | 0.462         | 0.558         | 0.558         | 0.558         | 0.593          | 0.544         | 0.572         | 0.358                   | 0.427                   | 0.048                         | 0.337                   | 0.420                   | 0.544               | 0.468                    | 0.641                 | 0.558                 | 0.496                 | 0.524                 | 0.248                 | 0.462                 | 0.586    | 0.455                      | 0.393                    | 0.482                          | 0.468             | 0.510                    | 0.510                  | 0.544                    | 0.468                    | 0.496                    | 0.475                      | 0.565                      | 0.634 | 0.606             | 0.524             | 0.441                   | 0.434                   | 0.606                   | 0.606                   | 0.648                    | 0.703                |  -              | 0.324                     | 0.586                      |
 elementary_mathematics                      | 0.312                    | 0.462                    | 0.113               | 0.505               | 0.497                | 0.261         | 0.484         | 0.470         | 0.476         | 0.476          | 0.367         | 0.373         | 0.171                   | 0.277                   | 0.044                         | 0.211                   | 0.293                   | 0.529               | 0.296                    | 0.481                 | 0.333                 | 0.312                 | 0.357                 | 0.134                 | 0.280                 | 0.526    | 0.309                      | 0.222                    | 0.312                          | 0.283             | 0.304                    | 0.428                  | 0.412                    | 0.373                    | 0.423                    | 0.388                      | 0.537                      | 0.544 | 0.481             | 0.497             | 0.407                   | 0.417                   | 0.560                   | 0.568                   | 0.791                    | 0.838                |  -              | 0.304                     | 0.455                      |
 formal_logic                                | 0.341                    | 0.468                    | 0.182               | 0.444               | 0.484                | 0.214         | 0.412         | 0.420         | 0.293         | 0.468          | 0.325         | 0.357         | 0.230                   | 0.333                   | 0.039                         | 0.261                   | 0.380                   | 0.396               | 0.261                    | 0.460                 | 0.373                 | 0.357                 | 0.420                 | 0.174                 | 0.253                 | 0.404    | 0.349                      | 0.277                    | 0.396                          | 0.261             | 0.261                    | 0.444                  | 0.420                    | 0.412                    | 0.452                    | 0.380                      | 0.523                      | 0.531 | 0.420             | 0.404             | 0.325                   | 0.341                   | 0.452                   | 0.428                   | 0.539                    | 0.626                |  -              | 0.190                     | 0.484                      |
 global_facts                                | 0.120                    | 0.220                    | 0.120               | 0.190               | 0.290                | 0.100         | 0.330         | 0.320         | 0.330         | 0.370          | 0.200         | 0.240         | 0.120                   | 0.230                   | 0.010                         | 0.140                   | 0.200                   | 0.390               | 0.140                    | 0.360                 | 0.160                 | 0.150                 | 0.150                 | 0.090                 | 0.110                 | 0.240    | 0.200                      | 0.160                    | 0.280                          | 0.160             | 0.210                    | 0.210                  | 0.190                    | 0.220                    | 0.240                    | 0.130                      | 0.360                      | 0.320 | 0.300             | 0.180             | 0.140                   | 0.200                   | 0.260                   | 0.260                   | 0.470                    | 0.430                |  -              | 0.220                     | 0.240                      |
 high_school_biology                         | 0.664                    | 0.783                    | 0.348               | 0.764               | 0.774                | 0.651         | 0.845         | 0.845         | 0.851         | 0.890          | 0.800         | 0.809         | 0.577                   | 0.696                   | 0.061                         | 0.525                   | 0.693                   | 0.790               | 0.680                    | 0.812                 | 0.732                 | 0.738                 | 0.729                 | 0.358                 | 0.677                 | 0.780    | 0.741                      | 0.654                    | 0.748                          | 0.677             | 0.706                    | 0.809                  | 0.770                    | 0.774                    | 0.793                    | 0.774                      | 0.861                      | 0.887 | 0.761             | 0.745             | 0.722                   | 0.754                   | 0.803                   | 0.806                   | 0.845                    | 0.896                |  -              | 0.670                     | 0.858                      |
 high_school_chemistry                       | 0.374                    | 0.487                    | 0.216               | 0.522               | 0.507                | 0.315         | 0.571         | 0.561         | 0.586         | 0.600          | 0.546         | 0.517         | 0.295                   | 0.413                   | 0.029                         | 0.320                   | 0.384                   | 0.522               | 0.438                    | 0.635                 | 0.453                 | 0.458                 | 0.467                 | 0.211                 | 0.433                 | 0.517    | 0.389                      | 0.310                    | 0.379                          | 0.384             | 0.428                    | 0.522                  | 0.536                    | 0.482                    | 0.512                    | 0.492                      | 0.551                      | 0.655 | 0.467             | 0.448             | 0.413                   | 0.463                   | 0.532                   | 0.536                   | 0.596                    | 0.724                |  -              | 0.315                     | 0.581                      |
 high_school_computer_science                | 0.540                    | 0.700                    | 0.250               | 0.740               | 0.740                | 0.440         | 0.690         | 0.720         | 0.710         | 0.770          | 0.660         | 0.660         | 0.500                   | 0.630                   | 0.040                         | 0.390                   | 0.580                   | 0.700               | 0.620                    | 0.670                 | 0.610                 | 0.590                 | 0.610                 | 0.230                 | 0.540                 | 0.770    | 0.610                      | 0.490                    | 0.630                          | 0.580             | 0.560                    | 0.680                  | 0.620                    | 0.610                    | 0.610                    | 0.580                      | 0.690                      | 0.870 | 0.710             | 0.610             | 0.600                   | 0.660                   | 0.770                   | 0.770                   | 0.830                    | 0.870                |  -              | 0.560                     | 0.720                      |
 high_school_european_history                | 0.624                    | 0.733                    | 0.490               | 0.757               | 0.745                | 0.672         | 0.800         | 0.800         | 0.806         | 0.830          | 0.812         | 0.830         | 0.624                   | 0.745                   | 0.121                         | 0.557                   | 0.684                   | 0.751               | 0.703                    | 0.642                 | 0.690                 | 0.696                 | 0.709                 | 0.369                 | 0.672                 | 0.775    | 0.696                      | 0.678                    | 0.709                          | 0.733             | 0.733                    | 0.751                  | 0.715                    | 0.690                    | 0.727                    | 0.672                      | 0.806                      | 0.812 | 0.751             | 0.715             | 0.733                   | 0.733                   | 0.787                   | 0.800                   | 0.824                    | 0.818                |  -              | 0.745                     | 0.787                      |
 high_school_geography                       | 0.595                    | 0.777                    | 0.393               | 0.717               | 0.747                | 0.676         | 0.863         | 0.868         | 0.878         | 0.888          | 0.792         | 0.818         | 0.555                   | 0.732                   | 0.070                         | 0.570                   | 0.707                   | 0.843               | 0.727                    | 0.797                 | 0.772                 | 0.747                 | 0.757                 | 0.419                 | 0.671                 | 0.838    | 0.727                      | 0.671                    | 0.752                          | 0.727             | 0.732                    | 0.813                  | 0.787                    | 0.747                    | 0.792                    | 0.737                      | 0.843                      | 0.888 | 0.797             | 0.787             | 0.712                   | 0.732                   | 0.833                   | 0.833                   | 0.868                    | 0.883                |  -              | 0.717                     | 0.843                      |
 high_school_government_and_politics         | 0.694                    | 0.839                    | 0.487               | 0.875               | 0.875                | 0.730         | 0.921         | 0.926         | 0.926         | 0.963          | 0.875         | 0.870         | 0.709                   | 0.823                   | 0.056                         | 0.658                   | 0.854                   | 0.865               | 0.821                    | 0.834                 | 0.808                 | 0.792                 | 0.818                 | 0.326                 | 0.725                 | 0.901    | 0.849                      | 0.805                    | 0.875                          | 0.863             | 0.836                    | 0.896                  | 0.823                    | 0.875                    | 0.849                    | 0.834                      | 0.937                      | 0.937 | 0.865             | 0.839             | 0.772                   | 0.797                   | 0.917                   | 0.917                   | 0.958                    | 0.968                |  -              | 0.805                     | 0.917                      |
 high_school_macroeconomics                  | 0.497                    | 0.630                    | 0.235               | 0.653               | 0.687                | 0.487         | 0.704         | 0.706         | 0.717         | 0.758          | 0.651         | 0.653         | 0.415                   | 0.520                   | 0.035                         | 0.420                   | 0.494                   | 0.633               | 0.498                    | 0.656                 | 0.558                 | 0.535                 | 0.556                 | 0.176                 | 0.497                 | 0.656    | 0.525                      | 0.478                    | 0.528                          | 0.532             | 0.521                    | 0.682                  | 0.661                    | 0.635                    | 0.646                    | 0.635                      | 0.756                      | 0.807 | 0.687             | 0.643             | 0.564                   | 0.592                   | 0.684                   | 0.684                   | 0.802                    | 0.825                |  -              | 0.496                     | 0.710                      |
 high_school_mathematics                     | 0.200                    | 0.307                    | 0.088               | 0.344               | 0.337                | 0.200         | 0.307         | 0.325         | 0.277         | 0.325          | 0.237         | 0.240         | 0.159                   | 0.177                   | 0.044                         | 0.155                   | 0.185                   | 0.348               | 0.162                    | 0.455                 | 0.244                 | 0.225                 | 0.255                 | 0.100                 | 0.233                 | 0.355    | 0.270                      | 0.162                    | 0.162                          | 0.237             | 0.203                    | 0.259                  | 0.244                    | 0.203                    | 0.214                    | 0.203                      | 0.281                      | 0.274 | 0.351             | 0.414             | 0.270                   | 0.244                   | 0.440                   | 0.422                   | 0.500                    | 0.537                |  -              | 0.174                     | 0.266                      |
 high_school_microeconomics                  | 0.588                    | 0.756                    | 0.268               | 0.823               | 0.827                | 0.521         | 0.780         | 0.772         | 0.801         | 0.852          | 0.760         | 0.773         | 0.462                   | 0.634                   | 0.025                         | 0.516                   | 0.634                   | 0.714               | 0.654                    | 0.756                 | 0.647                 | 0.630                 | 0.684                 | 0.260                 | 0.575                 | 0.802    | 0.609                      | 0.540                    | 0.630                          | 0.603             | 0.654                    | 0.798                  | 0.789                    | 0.773                    | 0.794                    | 0.743                      | 0.886                      | 0.861 | 0.802             | 0.773             | 0.672                   | 0.697                   | 0.827                   | 0.827                   | 0.857                    | 0.907                |  -              | 0.594                     | 0.848                      |
 high_school_physics                         | 0.178                    | 0.384                    | 0.099               | 0.509               | 0.496                | 0.218         | 0.463         | 0.463         | 0.423         | 0.496          | 0.344         | 0.364         | 0.145                   | 0.225                   | 0.039                         | 0.211                   | 0.198                   | 0.337               | 0.211                    | 0.456                 | 0.298                 | 0.337                 | 0.317                 | 0.105                 | 0.211                 | 0.443    | 0.284                      | 0.165                    | 0.251                          | 0.245             | 0.331                    | 0.423                  | 0.384                    | 0.384                    | 0.377                    | 0.384                      | 0.463                      | 0.569 | 0.298             | 0.344             | 0.317                   | 0.311                   | 0.470                   | 0.456                   | 0.635                    | 0.695                |  -              | 0.192                     | 0.456                      |
 high_school_psychology                      | 0.724                    | 0.820                    | 0.445               | 0.827               | 0.853                | 0.761         | 0.895         | 0.889         | 0.896         | 0.910          | 0.840         | 0.858         | 0.669                   | 0.785                   | 0.042                         | 0.660                   | 0.776                   | 0.838               | 0.771                    | 0.864                 | 0.840                 | 0.831                 | 0.834                 | 0.456                 | 0.761                 | 0.855    | 0.809                      | 0.764                    | 0.814                          | 0.817             | 0.797                    | 0.858                  | 0.838                    | 0.823                    | 0.855                    | 0.844                      | 0.884                      | 0.904 | 0.827             | 0.807             | 0.803                   | 0.796                   | 0.858                   | 0.856                   | 0.882                    | 0.902                |  -              | 0.779                     | 0.880                      |
 high_school_statistics                      | 0.412                    | 0.532                    | 0.185               | 0.564               | 0.625                | 0.347         | 0.592         | 0.601         | 0.574         | 0.615          | 0.509         | 0.500         | 0.319                   | 0.393                   | 0.037                         | 0.398                   | 0.453                   | 0.490               | 0.393                    | 0.569                 | 0.458                 | 0.435                 | 0.462                 | 0.240                 | 0.342                 | 0.615    | 0.467                      | 0.361                    | 0.476                          | 0.402             | 0.421                    | 0.550                  | 0.504                    | 0.472                    | 0.569                    | 0.523                      | 0.615                      | 0.643 | 0.550             | 0.583             | 0.481                   | 0.518                   | 0.615                   | 0.648                   | 0.717                    | 0.782                |  -              | 0.407                     | 0.555                      |
 high_school_us_history                      | 0.617                    | 0.784                    | 0.436               | 0.764               | 0.794                | 0.656         | 0.829         | 0.829         | 0.829         | 0.867          | 0.833         | 0.867         | 0.602                   | 0.740                   | 0.053                         | 0.588                   | 0.750                   | 0.759               | 0.709                    | 0.710                 | 0.764                 | 0.759                 | 0.784                 | 0.348                 | 0.696                 | 0.828    | 0.823                      | 0.699                    | 0.799                          | 0.782             | 0.792                    | 0.794                  | 0.754                    | 0.764                    | 0.759                    | 0.735                      | 0.833                      | 0.877 | 0.803             | 0.779             | 0.715                   | 0.759                   | 0.843                   | 0.852                   | 0.882                    | 0.906                |  -              | 0.803                     | 0.857                      |
 high_school_world_history                   | 0.670                    | 0.810                    | 0.535               | 0.759               | 0.818                | 0.700         | 0.881         | 0.864         | 0.872         | 0.881          | 0.810         | 0.827         | 0.637                   | 0.763                   | 0.088                         | 0.594                   | 0.751                   | 0.780               | 0.745                    | 0.746                 | 0.805                 | 0.784                 | 0.789                 | 0.405                 | 0.725                 | 0.793    | 0.776                      | 0.720                    | 0.797                          | 0.750             | 0.826                    | 0.759                  | 0.759                    | 0.729                    | 0.746                    | 0.742                      | 0.835                      | 0.869 | 0.805             | 0.784             | 0.776                   | 0.793                   | 0.818                   | 0.827                   | 0.869                    | 0.877                |  -              | 0.783                     | 0.848                      |
 human_aging                                 | 0.484                    | 0.587                    | 0.309               | 0.596               | 0.627                | 0.497         | 0.681         | 0.690         | 0.690         | 0.739          | 0.582         | 0.591         | 0.506                   | 0.560                   | 0.067                         | 0.434                   | 0.488                   | 0.650               | 0.614                    | 0.690                 | 0.569                 | 0.600                 | 0.618                 | 0.286                 | 0.569                 | 0.672    | 0.605                      | 0.542                    | 0.609                          | 0.632             | 0.623                    | 0.587                  | 0.565                    | 0.596                    | 0.582                    | 0.547                      | 0.672                      | 0.726 | 0.609             | 0.587             | 0.569                   | 0.587                   | 0.681                   | 0.690                   | 0.717                    | 0.771                |  -              | 0.587                     | 0.695                      |
 human_sexuality                             | 0.503                    | 0.603                    | 0.351               | 0.648               | 0.694                | 0.519         | 0.761         | 0.730         | 0.746         | 0.755          | 0.648         | 0.633         | 0.549                   | 0.702                   | 0.152                         | 0.572                   | 0.603                   | 0.702               | 0.638                    | 0.793                 | 0.671                 | 0.702                 | 0.671                 | 0.419                 | 0.587                 | 0.709    | 0.679                      | 0.569                    | 0.618                          | 0.615             | 0.646                    | 0.671                  | 0.648                    | 0.618                    | 0.664                    | 0.587                      | 0.748                      | 0.740 | 0.648             | 0.664             | 0.625                   | 0.625                   | 0.740                   | 0.717                   | 0.786                    | 0.839                |  -              | 0.584                     | 0.770                      |
 international_law                           | 0.644                    | 0.710                    | 0.404               | 0.727               | 0.776                | 0.644         | 0.785         | 0.785         | 0.801         | 0.760          | 0.735         | 0.752         | 0.628                   | 0.694                   | 0.099                         | 0.528                   | 0.677                   | 0.785               | 0.760                    | 0.826                 | 0.752                 | 0.727                 | 0.776                 | 0.388                 | 0.710                 | 0.743    | 0.752                      | 0.710                    | 0.694                          | 0.768             | 0.743                    | 0.768                  | 0.735                    | 0.694                    | 0.735                    | 0.727                      | 0.826                      | 0.892 | 0.776             | 0.768             | 0.710                   | 0.685                   | 0.768                   | 0.785                   | 0.834                    | 0.867                |  -              | 0.685                     | 0.859                      |
 jurisprudence                               | 0.629                    | 0.712                    | 0.444               | 0.740               | 0.768                | 0.611         | 0.794         | 0.794         | 0.785         | 0.833          | 0.675         | 0.722         | 0.638                   | 0.675                   | 0.064                         | 0.555                   | 0.657                   | 0.712               | 0.672                    | 0.796                 | 0.694                 | 0.740                 | 0.731                 | 0.333                 | 0.574                 | 0.731    | 0.722                      | 0.626                    | 0.750                          | 0.719             | 0.719                    | 0.777                  | 0.731                    | 0.722                    | 0.722                    | 0.750                      | 0.787                      | 0.787 | 0.787             | 0.740             | 0.694                   | 0.712                   | 0.759                   | 0.750                   | 0.824                    | 0.824                |  -              | 0.654                     | 0.824                      |
 logical_fallacies                           | 0.619                    | 0.717                    | 0.380               | 0.711               | 0.730                | 0.625         | 0.792         | 0.805         | 0.811         | 0.797          | 0.730         | 0.754         | 0.656                   | 0.711                   | 0.030                         | 0.619                   | 0.687                   | 0.785               | 0.691                    | 0.791                 | 0.736                 | 0.723                 | 0.736                 | 0.306                 | 0.687                 | 0.730    | 0.705                      | 0.660                    | 0.730                          | 0.666             | 0.691                    | 0.779                  | 0.773                    | 0.791                    | 0.785                    | 0.754                      | 0.852                      | 0.779 | 0.736             | 0.717             | 0.705                   | 0.723                   | 0.773                   | 0.766                   | 0.834                    | 0.877                |  -              | 0.641                     | 0.809                      |
 machine_learning                            | 0.348                    | 0.410                    | 0.196               | 0.508               | 0.491                | 0.241         | 0.401         | 0.410         | 0.437         | 0.571          | 0.419         | 0.401         | 0.276                   | 0.366                   | 0.035                         | 0.205                   | 0.419                   | 0.464               | 0.339                    | 0.410                 | 0.339                 | 0.312                 | 0.366                 | 0.080                 | 0.285                 | 0.419    | 0.366                      | 0.321                    | 0.366                          | 0.366             | 0.348                    | 0.437                  | 0.473                    | 0.383                    | 0.437                    | 0.375                      | 0.500                      | 0.544 | 0.383             | 0.375             | 0.339                   | 0.321                   | 0.437                   | 0.410                   | 0.526                    | 0.642                |  -              | 0.312                     | 0.526                      |
 management                                  | 0.708                    | 0.766                    | 0.417               | 0.825               | 0.786                | 0.737         | 0.805         | 0.825         | 0.825         | 0.844          | 0.737         | 0.766         | 0.572                   | 0.718                   | 0.097                         | 0.553                   | 0.699                   | 0.815               | 0.747                    | 0.834                 | 0.757                 | 0.757                 | 0.737                 | 0.427                 | 0.669                 | 0.815    | 0.766                      | 0.708                    | 0.699                          | 0.737             | 0.737                    | 0.815                  | 0.805                    | 0.786                    | 0.786                    | 0.776                      | 0.815                      | 0.854 | 0.737             | 0.718             | 0.689                   | 0.718                   | 0.805                   | 0.825                   | 0.825                    | 0.864                |  -              | 0.747                     | 0.786                      |
 marketing                                   | 0.722                    | 0.803                    | 0.517               | 0.820               | 0.854                | 0.760         | 0.871         | 0.880         | 0.863         | 0.893          | 0.850         | 0.858         | 0.726                   | 0.803                   | 0.064                         | 0.717                   | 0.773                   | 0.829               | 0.824                    | 0.876                 | 0.811                 | 0.824                 | 0.837                 | 0.465                 | 0.799                 | 0.863    | 0.794                      | 0.756                    | 0.811                          | 0.833             | 0.816                    | 0.846                  | 0.841                    | 0.824                    | 0.820                    | 0.803                      | 0.880                      | 0.914 | 0.841             | 0.837             | 0.811                   | 0.816                   | 0.888                   | 0.893                   | 0.897                    | 0.901                |  -              | 0.782                     | 0.846                      |
 medical_genetics                            | 0.570                    | 0.690                    | 0.340               | 0.720               | 0.750                | 0.580         | 0.780         | 0.750         | 0.780         | 0.810          | 0.630         | 0.640         | 0.570                   | 0.650                   | 0.060                         | 0.510                   | 0.600                   | 0.690               | 0.700                    | 0.770                 | 0.700                 | 0.690                 | 0.720                 | 0.320                 | 0.660                 | 0.770    | 0.660                      | 0.600                    | 0.740                          | 0.630             | 0.660                    | 0.760                  | 0.680                    | 0.710                    | 0.710                    | 0.700                      | 0.830                      | 0.860 | 0.690             | 0.660             | 0.660                   | 0.690                   | 0.770                   | 0.770                   | 0.820                    | 0.900                |  -              | 0.640                     | 0.820                      |
 miscellaneous                               | 0.637                    | 0.757                    | 0.420               | 0.749               | 0.768                | 0.698         | 0.832         | 0.832         | 0.830         | 0.854          | 0.775         | 0.796         | 0.650                   | 0.776                   | 0.094                         | 0.646                   | 0.766                   | 0.787               | 0.760                    | 0.822                 | 0.773                 | 0.776                 | 0.773                 | 0.454                 | 0.736                 | 0.796    | 0.759                      | 0.727                    | 0.782                          | 0.766             | 0.756                    | 0.795                  | 0.770                    | 0.756                    | 0.777                    | 0.759                      | 0.837                      | 0.864 | 0.814             | 0.798             | 0.724                   | 0.726                   | 0.807                   | 0.814                   | 0.871                    | 0.885                |  -              | 0.746                     | 0.828                      |
 moral_disputes                              | 0.468                    | 0.624                    | 0.323               | 0.609               | 0.618                | 0.526         | 0.686         | 0.671         | 0.680         | 0.736          | 0.604         | 0.612         | 0.456                   | 0.592                   | 0.083                         | 0.442                   | 0.580                   | 0.589               | 0.607                    | 0.696                 | 0.618                 | 0.578                 | 0.621                 | 0.283                 | 0.560                 | 0.644    | 0.572                      | 0.524                    | 0.552                          | 0.598             | 0.645                    | 0.647                  | 0.644                    | 0.635                    | 0.615                    | 0.621                      | 0.696                      | 0.748 | 0.658             | 0.630             | 0.537                   | 0.566                   | 0.664                   | 0.676                   | 0.725                    | 0.760                |  -              | 0.554                     | 0.708                      |
 moral_scenarios                             | 0.234                    | 0.344                    | 0.115               | 0.165               | 0.411                | 0.227         | 0.330         | 0.410         | 0.325         | 0.366          | 0.307         | 0.360         | 0.082                   | 0.164                   | 0.007                         | 0.004                   | 0.243                   | 0.280               | 0.243                    | 0.439                 | 0.184                 | 0.153                 | 0.205                 | 0.136                 | 0.410                 | 0.283    | 0.246                      | 0.122                    | 0.226                          | 0.229             | 0.327                    | 0.391                  | 0.317                    | 0.288                    | 0.366                    | 0.404                      | 0.538                      | 0.582 | 0.336             | 0.139             | 0.130                   | 0.058                   | 0.318                   | 0.368                   | 0.546                    | 0.565                |  -              | 0.188                     | 0.477                      |
 nutrition                                   | 0.539                    | 0.666                    | 0.313               | 0.650               | 0.666                | 0.591         | 0.683         | 0.669         | 0.683         | 0.758          | 0.643         | 0.653         | 0.486                   | 0.611                   | 0.039                         | 0.496                   | 0.591                   | 0.650               | 0.653                    | 0.751                 | 0.686                 | 0.660                 | 0.689                 | 0.405                 | 0.620                 | 0.735    | 0.647                      | 0.555                    | 0.614                          | 0.624             | 0.633                    | 0.660                  | 0.630                    | 0.630                    | 0.669                    | 0.620                      | 0.751                      | 0.771 | 0.692             | 0.647             | 0.647                   | 0.630                   | 0.745                   | 0.745                   | 0.790                    | 0.797                |  -              | 0.575                     | 0.722                      |
 philosophy                                  | 0.491                    | 0.598                    | 0.327               | 0.681               | 0.675                | 0.527         | 0.654         | 0.641         | 0.658         | 0.713          | 0.652         | 0.659         | 0.472                   | 0.623                   | 0.051                         | 0.485                   | 0.594                   | 0.636               | 0.590                    | 0.723                 | 0.614                 | 0.639                 | 0.617                 | 0.340                 | 0.578                 | 0.646    | 0.598                      | 0.587                    | 0.633                          | 0.612             | 0.580                    | 0.636                  | 0.630                    | 0.598                    | 0.630                    | 0.588                      | 0.704                      | 0.784 | 0.646             | 0.578             | 0.562                   | 0.565                   | 0.675                   | 0.688                   | 0.774                    | 0.778                |  -              | 0.554                     | 0.717                      |
 prehistory                                  | 0.549                    | 0.685                    | 0.308               | 0.660               | 0.697                | 0.518         | 0.727         | 0.730         | 0.728         | 0.783          | 0.635         | 0.663         | 0.484                   | 0.638                   | 0.061                         | 0.469                   | 0.601                   | 0.669               | 0.638                    | 0.746                 | 0.679                 | 0.691                 | 0.700                 | 0.348                 | 0.604                 | 0.709    | 0.623                      | 0.580                    | 0.675                          | 0.697             | 0.648                    | 0.731                  | 0.688                    | 0.675                    | 0.697                    | 0.663                      | 0.774                      | 0.805 | 0.675             | 0.629             | 0.641                   | 0.666                   | 0.762                   | 0.756                   | 0.836                    | 0.861                |  -              | 0.595                     | 0.783                      |
 professional_accounting                     | 0.319                    | 0.400                    | 0.184               | 0.418               | 0.432                | 0.326         | 0.507         | 0.489         | 0.496         | 0.514          | 0.404         | 0.425         | 0.262                   | 0.354                   | 0.028                         | 0.244                   | 0.319                   | 0.453               | 0.386                    | 0.492                 | 0.358                 | 0.386                 | 0.393                 | 0.102                 | 0.336                 | 0.432    | 0.382                      | 0.336                    | 0.421                          | 0.361             | 0.382                    | 0.414                  | 0.421                    | 0.397                    | 0.418                    | 0.386                      | 0.578                      | 0.510 | 0.443             | 0.478             | 0.386                   | 0.414                   | 0.457                   | 0.460                   | 0.560                    | 0.631                |  -              | 0.358                     | 0.514                      |
 professional_law                            | 0.292                    | 0.393                    | 0.202               | 0.397               | 0.417                | 0.307         | 0.486         | 0.478         | 0.478         | 0.528          | 0.404         | 0.408         | 0.329                   | 0.383                   | 0.073                         | 0.305                   | 0.387                   | 0.359               | 0.363                    | 0.367                 | 0.359                 | 0.377                 | 0.397                 | 0.180                 | 0.369                 | 0.372    | 0.383                      | 0.333                    | 0.379                          | 0.399             | 0.383                    | 0.440                  | 0.402                    | 0.405                    | 0.410                    | 0.401                      | 0.498                      | 0.492 | 0.423             | 0.386             | 0.340                   | 0.337                   | 0.401                   | 0.402                   | 0.477                    | 0.541                |  -              | 0.350                     | 0.481                      |
 professional_medicine                       | 0.426                    | 0.558                    | 0.235               | 0.639               | 0.636                | 0.485         | 0.749         | 0.774         | 0.756         | 0.794          | 0.654         | 0.680         | 0.375                   | 0.588                   | 0.014                         | 0.419                   | 0.591                   | 0.665               | 0.682                    | 0.761                 | 0.713                 | 0.727                 | 0.724                 | 0.323                 | 0.713                 | 0.643    | 0.672                      | 0.564                    | 0.705                          | 0.642             | 0.619                    | 0.705                  | 0.654                    | 0.643                    | 0.687                    | 0.658                      | 0.794                      | 0.823 | 0.658             | 0.613             | 0.573                   | 0.580                   | 0.680                   | 0.683                   | 0.812                    | 0.845                |  -              | 0.645                     | 0.794                      |
 professional_psychology                     | 0.446                    | 0.591                    | 0.300               | 0.647               | 0.665                | 0.477         | 0.722         | 0.717         | 0.728         | 0.805          | 0.598         | 0.609         | 0.449                   | 0.535                   | 0.044                         | 0.388                   | 0.547                   | 0.599               | 0.580                    | 0.686                 | 0.616                 | 0.619                 | 0.642                 | 0.256                 | 0.509                 | 0.669    | 0.565                      | 0.521                    | 0.602                          | 0.560             | 0.588                    | 0.686                  | 0.648                    | 0.638                    | 0.655                    | 0.617                      | 0.764                      | 0.799 | 0.671             | 0.637             | 0.586                   | 0.591                   | 0.707                   | 0.702                   | 0.776                    | 0.810                |  -              | 0.529                     | 0.759                      |
 public_relations                            | 0.463                    | 0.563                    | 0.409               | 0.563               | 0.600                | 0.563         | 0.690         | 0.700         | 0.700         | 0.672          | 0.572         | 0.627         | 0.500                   | 0.581                   | 0.072                         | 0.454                   | 0.590                   | 0.636               | 0.590                    | 0.636                 | 0.536                 | 0.518                 | 0.518                 | 0.245                 | 0.545                 | 0.590    | 0.581                      | 0.554                    | 0.627                          | 0.581             | 0.518                    | 0.618                  | 0.572                    | 0.627                    | 0.554                    | 0.572                      | 0.672                      | 0.727 | 0.636             | 0.618             | 0.563                   | 0.572                   | 0.627                   | 0.645                   | 0.736                    | 0.663                |  -              | 0.554                     | 0.581                      |
 security_studies                            | 0.616                    | 0.730                    | 0.240               | 0.608               | 0.644                | 0.616         | 0.710         | 0.751         | 0.746         | 0.763          | 0.624         | 0.632         | 0.514                   | 0.648                   | 0.102                         | 0.530                   | 0.583                   | 0.759               | 0.644                    | 0.738                 | 0.665                 | 0.653                 | 0.665                 | 0.440                 | 0.616                 | 0.697    | 0.673                      | 0.600                    | 0.608                          | 0.612             | 0.628                    | 0.685                  | 0.693                    | 0.697                    | 0.669                    | 0.673                      | 0.738                      | 0.730 | 0.665             | 0.669             | 0.620                   | 0.653                   | 0.718                   | 0.718                   | 0.767                    | 0.775                |  -              | 0.575                     | 0.730                      |
 sociology                                   | 0.701                    | 0.791                    | 0.412               | 0.781               | 0.791                | 0.666         | 0.800         | 0.825         | 0.815         | 0.860          | 0.736         | 0.741         | 0.626                   | 0.716                   | 0.039                         | 0.691                   | 0.741                   | 0.810               | 0.766                    | 0.830                 | 0.751                 | 0.756                 | 0.786                 | 0.452                 | 0.741                 | 0.835    | 0.771                      | 0.716                    | 0.786                          | 0.776             | 0.781                    | 0.805                  | 0.800                    | 0.800                    | 0.820                    | 0.781                      | 0.850                      | 0.870 | 0.825             | 0.820             | 0.716                   | 0.736                   | 0.815                   | 0.825                   | 0.855                    | 0.860                |  -              | 0.741                     | 0.835                      |
 us_foreign_policy                           | 0.630                    | 0.820                    | 0.510               | 0.780               | 0.790                | 0.690         | 0.838         | 0.868         | 0.868         | 0.840          | 0.780         | 0.800         | 0.700                   | 0.790                   | 0.150                         | 0.680                   | 0.760                   | 0.790               | 0.777                    | 0.890                 | 0.790                 | 0.770                 | 0.800                 | 0.450                 | 0.800                 | 0.820    | 0.840                      | 0.757                    | 0.740                          | 0.787             | 0.787                    | 0.790                  | 0.760                    | 0.740                    | 0.760                    | 0.770                      | 0.850                      | 0.890 | 0.800             | 0.820             | 0.750                   | 0.780                   | 0.820                   | 0.820                   | 0.890                    | 0.880                |  -              | 0.757                     | 0.810                      |
 virology                                    | 0.325                    | 0.463                    | 0.246               | 0.433               | 0.445                | 0.433         | 0.457         | 0.475         | 0.472         | 0.506          | 0.415         | 0.439         | 0.367                   | 0.439                   | 0.096                         | 0.307                   | 0.379                   | 0.415               | 0.460                    | 0.506                 | 0.415                 | 0.439                 | 0.439                 | 0.301                 | 0.415                 | 0.463    | 0.475                      | 0.387                    | 0.421                          | 0.436             | 0.448                    | 0.421                  | 0.391                    | 0.379                    | 0.403                    | 0.367                      | 0.487                      | 0.500 | 0.457             | 0.421             | 0.373                   | 0.427                   | 0.463                   | 0.457                   | 0.487                    | 0.518                |  -              | 0.381                     | 0.487                      |
 world_religions                             | 0.637                    | 0.713                    | 0.403               | 0.748               | 0.801                | 0.678         | 0.800         | 0.817         | 0.800         | 0.847          | 0.766         | 0.766         | 0.643                   | 0.760                   | 0.052                         | 0.614                   | 0.736                   | 0.801               | 0.729                    | 0.801                 | 0.766                 | 0.783                 | 0.789                 | 0.508                 | 0.742                 | 0.801    | 0.777                      | 0.747                    | 0.789                          | 0.800             | 0.747                    | 0.766                  | 0.754                    | 0.742                    | 0.742                    | 0.725                      | 0.801                      | 0.836 | 0.766             | 0.748             | 0.783                   | 0.760                   | 0.818                   | 0.818                   | 0.859                    | 0.871                |  -              | 0.705                     | 0.812                      |
 MMLU                                        | 0.465                    | 0.585                    | 0.285               | 0.591               | 0.623                | 0.475         | 0.646         | 0.652         | 0.647         | 0.687          | 0.580         | 0.595         | 0.429                   | 0.530                   | 0.055                         | 0.413                   | 0.529                   | 0.595               | 0.537                    | 0.640                 | 0.556                 | 0.552                 | 0.570                 | 0.281                 | 0.525                 | 0.613    | 0.550                      | 0.486                    | 0.555                          | 0.544             | 0.553                    | 0.618                  | 0.590                    | 0.578                    | 0.599                    | 0.578                      | 0.682                      | 0.710 | 0.610             | 0.576             | 0.532                   | 0.540                   | 0.639                   | 0.643                   | 0.721                    | 0.757                |  -              | 0.509                     | 0.666                      |
 AGIEVAL
 aquarat                                     | 0.645                    | 0.763                    | 0.374               | 0.602               | 0.562                | 0.460         | 0.677         | 0.696         | 0.665         | 0.602          | 0.653         | 0.637         | 0.488                   | 0.614                   | 0.279                         | 0.488                   | 0.594                   | 0.657               | 0.145                    | 0.653                 | 0.681                 | 0.673                 | 0.598                 | 0.370                 | 0.633                 | 0.755    | 0.712                      | 0.279                    | 0.322                          | 0.582             | 0.157                    | 0.129                  | 0.212                    | 0.338                    | 0.409                    | 0.574                      | 0.614                      | 0.834 | 0.712             | 0.566             | 0.732                   | 0.728                   | 0.799                   | 0.830                   | 0.822                    | 0.870                |  -              | 0.425                     | 0.590                      |
 logiqa                                      | 0.274                    | 0.359                    | 0.208               | 0.356               | 0.337                | 0.321         | 0.443         | 0.440         | 0.447         | 0.477          | 0.399         | 0.416         | 0.282                   | 0.324                   | 0.052                         | 0.248                   | 0.321                   | 0.393               | 0.333                    | 0.413                 | 0.311                 | 0.321                 | 0.328                 | 0.168                 | 0.265                 | 0.376    | 0.324                      | 0.264                    | 0.311                          | 0.330             | 0.327                    | 0.339                  | 0.308                    | 0.285                    | 0.281                    | 0.267                      | 0.405                      | 0.445 | 0.359             | 0.351             | 0.316                   | 0.342                   | 0.427                   | 0.436                   | 0.493                    | 0.554                |  -              | 0.290                     | 0.391                      |
 lsatar                                      | 0.260                    | 0.256                    | 0.213               | 0.213               | 0.282                | 0.191         | 0.234         | 0.239         | 0.208         | 0.260          | 0.073         | 0.217         | 0.234                   | 0.226                   | 0.208                         | 0.221                   | 0.213                   | 0.239               | 0.200                    | 0.252                 | 0.278                 | 0.278                 | 0.295                 | 0.200                 | 0.239                 | 0.252    | 0.186                      | 0.186                    | 0.200                          | 0.278             | 0.208                    | 0.265                  | 0.260                    | 0.252                    | 0.256                    | 0.247                      | 0.208                      | 0.369 | 0.252             | 0.178             | 0.230                   | 0.226                   | 0.260                   | 0.300                   | 0.321                    | 0.400                |  -              | 0.173                     | 0.226                      |
 lsatlr                                      | 0.400                    | 0.525                    | 0.203               | 0.486               | 0.537                | 0.337         | 0.625         | 0.627         | 0.635         | 0.654          | 0.505         | 0.515         | 0.296                   | 0.415                   | 0.043                         | 0.243                   | 0.449                   | 0.513               | 0.445                    | 0.490                 | 0.425                 | 0.433                 | 0.441                 | 0.180                 | 0.327                 | 0.541    | 0.447                      | 0.366                    | 0.445                          | 0.523             | 0.570                    | 0.456                  | 0.431                    | 0.429                    | 0.415                    | 0.386                      | 0.598                      | 0.621 | 0.456             | 0.466             | 0.452                   | 0.449                   | 0.598                   | 0.603                   | 0.729                    | 0.811                |  -              | 0.449                     | 0.576                      |
 lsatrc                                      | 0.513                    | 0.680                    | 0.312               | 0.594               | 0.646                | 0.431         | 0.747         | 0.747         | 0.750         | 0.754          | 0.635         | 0.643         | 0.390                   | 0.557                   | 0.048                         | 0.379                   | 0.565                   | 0.643               | 0.605                    | 0.672                 | 0.591                 | 0.635                 | 0.624                 | 0.223                 | 0.486                 | 0.657    | 0.635                      | 0.520                    | 0.650                          | 0.613             | 0.617                    | 0.568                  | 0.513                    | 0.557                    | 0.531                    | 0.524                      | 0.672                      | 0.762 | 0.583             | 0.572             | 0.553                   | 0.617                   | 0.661                   | 0.687                   | 0.810                    | 0.836                |  -              | 0.654                     | 0.706                      |
 saten                                       | 0.703                    | 0.796                    | 0.470               | 0.791               | 0.810                | 0.665         | 0.839         | 0.844         | 0.834         | 0.868          | 0.815         | 0.820         | 0.519                   | 0.791                   | 0.165                         | 0.582                   | 0.757                   | 0.820               | 0.762                    | 0.834                 | 0.776                 | 0.762                 | 0.781                 | 0.339                 | 0.689                 | 0.810    | 0.825                      | 0.679                    | 0.747                          | 0.786             | 0.786                    | 0.737                  | 0.708                    | 0.713                    | 0.713                    | 0.708                      | 0.800                      | 0.830 | 0.781             | 0.776             | 0.733                   | 0.776                   | 0.810                   | 0.844                   | 0.888                    | 0.922                |  -              | 0.757                     | 0.796                      |
 satmath                                     | 0.840                    | 0.936                    | 0.559               | 0.790               | 0.822                | 0.627         | 0.900         | 0.872         | 0.886         | 0.768          | 0.863         | 0.868         | 0.577                   | 0.745                   | 0.390                         | 0.650                   | 0.800                   | 0.904               | 0.377                    | 0.804                 | 0.768                 | 0.822                 | 0.618                 | 0.550                 | 0.845                 | 0.968    | 0.886                      | 0.400                    | 0.395                          | 0.690             | 0.413                    | 0.190                  | 0.331                    | 0.509                    | 0.713                    | 0.754                      | 0.727                      | 0.977 | 0.900             | 0.731             | 0.900                   | 0.922                   | 0.963                   | 0.963                   | 0.990                    | 0.981                |  -              | 0.540                     | 0.768                      |
 AGIEVAL                                     | 0.459                    | 0.558                    | 0.294               | 0.503               | 0.523                | 0.398         | 0.600         | 0.600         | 0.598         | 0.602          | 0.525         | 0.546         | 0.364                   | 0.473                   | 0.131                         | 0.352                   | 0.479                   | 0.547               | 0.397                    | 0.544                 | 0.489                 | 0.501                 | 0.480                 | 0.253                 | 0.433                 | 0.567    | 0.512                      | 0.359                    | 0.416                          | 0.501             | 0.432                    | 0.382                  | 0.381                    | 0.409                    | 0.429                    | 0.438                      | 0.546                      | 0.638 | 0.522             | 0.481             | 0.501                   | 0.520                   | 0.599                   | 0.616                   | 0.681                    | 0.734                |  -              | 0.434                     | 0.544                      |
 AGIEVALC_biology                            |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.756         | 0.778         | 0.204                   | 0.334                   | 0.004                         | 0.204                   | 0.326                   | 0.830               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.769    | 0.430                      |  -                       | 0.526                          | 0.304             | 0.408                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.691             |  -                | 0.660                   | 0.700                   | 0.804                   | 0.813                   | 0.834                    | 0.582                |  -              | 0.356                     | 0.508                      |
 AGIEVALC_chemistry                          |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.642         | 0.691         | 0.142                   | 0.250                   | 0                             | 0.117                   | 0.215                   | 0.598               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.509    | 0.259                      |  -                       | 0.343                          | 0.215             | 0.313                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.563             |  -                | 0.441                   | 0.470                   | 0.583                   | 0.627                   | 0.696                    | 0.789                |  -              | 0.171                     | 0.348                      |
 AGIEVALC_chinese                            |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.642         | 0.650         | 0.186                   | 0.186                   | 0.024                         | 0.101                   | 0.138                   | 0.682               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.552    | 0.337                      |  -                       | 0.325                          | 0.300             | 0.313                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.650             |  -                | 0.508                   | 0.504                   | 0.585                   | 0.593                   | 0.760                    | 0.735                |  -              | 0.239                     | 0.272                      |
 AGIEVALC_english                            |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.823         | 0.833         | 0.647                   | 0.748                   | 0.094                         | 0.588                   | 0.728                   | 0.947               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.846    | 0.839                      |  -                       | 0.797                          | 0.807             | 0.862                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.866             |  -                | 0.794                   | 0.839                   | 0.856                   | 0.849                   | 0.915                    | 0.924                |  -              | 0.830                     | 0.774                      |
 AGIEVALC_geography                          |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.728         | 0.728         | 0.316                   | 0.386                   | 0.040                         | 0.311                   | 0.412                   | 0.778               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.718    | 0.537                      |  -                       | 0.572                          | 0.371             | 0.457                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.768             |  -                | 0.643                   | 0.633                   | 0.753                   | 0.778                   | 0.804                    | 0.839                |  -              | 0.346                     | 0.577                      |
 AGIEVALC_history                            |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.829         | 0.834         | 0.314                   | 0.400                   | 0.021                         | 0.323                   | 0.412                   | 0.817               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.753    | 0.629                      |  -                       | 0.642                          | 0.378             | 0.485                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.821             |  -                | 0.740                   | 0.744                   | 0.774                   | 0.800                   | 0.842                    | 0.923                |  -              | 0.357                     | 0.557                      |
 AGIEVALC_jecqaca                            |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.414         | 0.440         | 0.196                   | 0.232                   | 0.022                         | 0.206                   | 0.221                   | 0.514               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.425    | 0.258                      |  -                       | 0.247                          | 0.223             | 0.273                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.566             |  -                | 0.425                   | 0.424                   | 0.482                   | 0.487                   | 0.564                    | 0.622                |  -              | 0.185                     | 0.266                      |
 AGIEVALC_jecqakd                            |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.549         | 0.559         | 0.179                   | 0.212                   | 0.033                         | 0.148                   | 0.215                   | 0.620               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.540    | 0.290                      |  -                       | 0.304                          | 0.281             | 0.275                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.636             |  -                | 0.498                   | 0.526                   | 0.592                   | 0.605                   | 0.732                    | 0.747                |  -              | 0.242                     | 0.288                      |
 AGIEVALC_logiqa                             |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.479         | 0.490         | 0.218                   | 0.296                   | 0.035                         | 0.195                   | 0.279                   | 0.519               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.442    | 0.313                      |  -                       | 0.310                          | 0.317             | 0.330                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.462             |  -                | 0.399                   | 0.405                   | 0.497                   | 0.500                   | 0.565                    | 0.588                |  -              | 0.274                     | 0.357                      |
 AGIEVALC_mathcloze                          |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.491         | 0.542         | 0.186                   | 0.237                   |  -                            | 0.288                   | 0.415                   | 0.576               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.652    | 0.508                      |  -                       | 0.288                          | 0.152             | 0.245                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.449             |  -                | 0.508                   | 0.440                   | 0.694                   | 0.686                   | 0.737                    | 0.805                | 0.864           | 0.110                     | 0.618                      |
 AGIEVALC_mathqa                             |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.621         | 0.648         | 0.343                   | 0.404                   | 0.281                         | 0.401                   | 0.465                   | 0.662               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.656    | 0.578                      |  -                       | 0.485                          | 0.296             | 0.334                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.543             |  -                | 0.595                   | 0.683                   | 0.779                   | 0.755                   | 0.808                    | 0.834                | 0.828           | 0.261                     | 0.357                      |
 AGIEVALC_physics                            |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.396         | 0.425         | 0.166                   | 0.229                   | 0.034                         | 0.166                   | 0.189                   | 0.436               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.402    | 0.206                      |  -                       | 0.258                          | 0.183             | 0.235                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.494             |  -                | 0.390                   | 0.413                   | 0.431                   | 0.500                   | 0.683                    | 0.770                | 0.741           | 0.178                     | 0.310                      |
 AGIEVALC                                    |  -                       |  -                       |  -                  |  -                  |  -                   |  -            |  -            |  -            |  -            |  -             | 0.589         | 0.607         | 0.257                   | 0.322                   | 0.076                         | 0.248                   | 0.322                   | 0.645               |  -                       |  -                    |  -                    |  -                    |  -                    |  -                    |  -                    | 0.576    | 0.409                      |  -                       | 0.404                          | 0.325             | 0.371                    |  -                     |  -                       |  -                       |  -                       |  -                         |  -                         |  -    | 0.612             |  -                | 0.529                   | 0.548                   | 0.627                   | 0.636                   | 0.716                    | 0.734                | 0.811           | 0.298                     | 0.403                      |
 BBH
 boolean_expressions                         | 0.720                    | 0.740                    | 0.544               | 0.860               | 0.876                | 0.556         | 0.764         | 0.776         | 0.768         | 0.460          | 0.848         | 0.868         | 0.812                   | 0.856                   | 0.724                         | 0.752                   | 0.700                   | 0.688               | 0.796                    | 0.804                 | 0.824                 | 0.832                 | 0.844                 | 0.460                 | 0.480                 | 0.896    | 0.728                      | 0.764                    | 0.780                          | 0.824             | 0.664                    | 0.816                  | 0.848                    | 0.800                    | 0.852                    | 0.832                      | 0.696                      | 0.936 | 0.808             | 0.776             | 0.756                   | 0.796                   | 0.864                   | 0.880                   | 0.888                    | 0.808                |  -              | 0.720                     | 0.540                      |
 causal_judgement                            | 0.588                    | 0.598                    | 0.550               | 0.577               | 0.582                | 0.524         | 0.609         | 0.604         | 0.598         | 0.604          | 0.550         | 0.550         | 0.566                   | 0.598                   | 0.566                         | 0.604                   | 0.588                   | 0.689               | 0.540                    | 0.513                 | 0.545                 | 0.518                 | 0.540                 | 0.502                 | 0.518                 | 0.577    | 0.593                      | 0.588                    | 0.625                          | 0.614             | 0.604                    | 0.556                  | 0.508                    | 0.598                    | 0.588                    | 0.593                      | 0.588                      | 0.647 | 0.625             | 0.582             | 0.497                   | 0.529                   | 0.508                   | 0.513                   | 0.647                    | 0.700                |  -              | 0.636                     | 0.641                      |
 date_understanding                          | 0.576                    | 0.752                    | 0.324               | 0.668               | 0.748                | 0.592         | 0.780         | 0.764         | 0.748         | 0.788          | 0.580         | 0.572         | 0.560                   | 0.684                   | 0.284                         | 0.628                   | 0.660                   | 0.832               | 0.700                    | 0.660                 | 0.732                 | 0.724                 | 0.716                 | 0.400                 | 0.664                 | 0.728    | 0.772                      | 0.548                    | 0.668                          | 0.592             | 0.608                    | 0.644                  | 0.464                    | 0.568                    | 0.696                    | 0.576                      | 0.780                      | 0.932 | 0.544             | 0.544             | 0.616                   | 0.648                   | 0.764                   | 0.740                   | 0.856                    | 0.872                |  -              | 0.556                     | 0.724                      |
 disambiguation_qa                           | 0.612                    | 0.588                    | 0.400               | 0.712               | 0.668                | 0.532         | 0.688         | 0.652         | 0.660         | 0.720          | 0.584         | 0.636         | 0.628                   | 0.640                   | 0.380                         | 0.616                   | 0.648                   | 0.732               | 0.584                    | 0.536                 | 0.552                 | 0.540                 | 0.516                 | 0.424                 | 0.472                 | 0.688    | 0.644                      | 0.600                    | 0.596                          | 0.728             | 0.704                    | 0.604                  | 0.640                    | 0.592                    | 0.720                    | 0.752                      | 0.692                      | 0.768 | 0.660             | 0.696             | 0.544                   | 0.556                   | 0.656                   | 0.636                   | 0.764                    | 0.780                |  -              | 0.576                     | 0.640                      |
 dyck_languages                              | 0.596                    | 0.664                    | 0.424               | 0.704               | 0.712                | 0.476         | 0.752         | 0.720         | 0.728         | 0.600          | 0.516         | 0.544         | 0.560                   | 0.704                   | 0.356                         | 0.700                   | 0.664                   | 0.728               | 0.792                    | 0.228                 | 0.832                 | 0.724                 | 0.796                 | 0.536                 | 0.680                 | 0.736    | 0.756                      | 0.744                    | 0.712                          | 0.664             | 0.732                    | 0.752                  | 0.532                    | 0.424                    | 0.580                    | 0.468                      | 0.532                      | 0.776 | 0.756             | 0.576             | 0.596                   | 0.628                   | 0.868                   | 0.836                   | 0.648                    | 0.820                |  -              | 0.684                     | 0.572                      |
 formal_fallacies                            | 0.764                    | 0.636                    | 0.624               | 0.740               | 0.660                | 0.532         | 0.868         | 0.824         | 0.832         | 0.760          | 0.568         | 0.660         | 0.960                   | 0.640                   | 0.896                         | 0.992                   | 0.832                   | 0.920               | 0.780                    | 0.992                 | 0.920                 | 0.988                 | 0.984                 | 0.992                 | 0.816                 | 0.672    | 0.532                      | 0.852                    | 0.996                          | 0.632             | 0.564                    | 0.876                  | 0.876                    | 0.920                    | 0.808                    | 0.808                      | 0.944                      | 0.804 | 0.632             | 0.716             | 0.928                   | 0.852                   | 0.628                   | 0.628                   | 0.784                    | 0.812                |  -              | 0.776                     | 0.576                      |
 geometric_shapes                            | 0.420                    | 0.520                    | 0.056               | 0.544               | 0.456                | 0.204         | 0.400         | 0.384         | 0.436         | 0.420          | 0.392         | 0.400         | 0.268                   | 0.368                   | 0.096                         | 0.352                   | 0.488                   | 0.840               | 0.352                    | 0.400                 | 0.440                 | 0.488                 | 0.440                 | 0.088                 | 0.416                 | 0.564    | 0.520                      | 0.288                    | 0.404                          | 0.348             | 0.344                    | 0.468                  | 0.372                    | 0.248                    | 0.416                    | 0.292                      | 0.328                      | 0.648 | 0.356             | 0.276             | 0.204                   | 0.212                   | 0.544                   | 0.604                   | 0.584                    | 0.640                |  -              | 0.268                     | 0.400                      |
 hyperbaton                                  | 0.724                    | 0.872                    | 0.512               | 0.572               | 0.680                | 0.704         | 0.888         | 0.856         | 0.884         | 0.836          | 0.740         | 0.824         | 0.612                   | 0.724                   | 0.468                         | 0.604                   | 0.704                   | 0.928               | 0.712                    | 0.752                 | 0.824                 | 0.768                 | 0.880                 | 0.588                 | 0.624                 | 0.664    | 0.804                      | 0.656                    | 0.644                          | 0.828             | 0.724                    | 0.800                  | 0.968                    | 0.940                    | 0.936                    | 0.936                      | 0.952                      | 0.996 | 0.704             | 0.656             | 0.636                   | 0.676                   | 0.832                   | 0.792                   | 0.868                    | 0.956                |  -              | 0.744                     | 0.900                      |
 logical_deduction_five_objects              | 0.592                    | 0.724                    | 0.176               | 0.700               | 0.532                | 0.300         | 0.596         | 0.636         | 0.568         | 0.608          | 0.528         | 0.516         | 0.352                   | 0.464                   | 0.204                         | 0.424                   | 0.520                   | 0.660               | 0.500                    | 0.576                 | 0.540                 | 0.536                 | 0.568                 | 0.236                 | 0.484                 | 0.752    | 0.592                      | 0.352                    | 0.556                          | 0.384             | 0.472                    | 0.580                  | 0.464                    | 0.432                    | 0.632                    | 0.532                      | 0.532                      | 0.940 | 0.556             | 0.524             | 0.468                   | 0.528                   | 0.752                   | 0.728                   | 0.876                    | 0.924                |  -              | 0.436                     | 0.612                      |
 logical_deduction_seven_objects             | 0.540                    | 0.672                    | 0.152               | 0.556               | 0.492                | 0.284         | 0.580         | 0.564         | 0.560         | 0.552          | 0.444         | 0.500         | 0.284                   | 0.388                   | 0.140                         | 0.376                   | 0.464                   | 0.648               | 0.472                    | 0.516                 | 0.472                 | 0.484                 | 0.488                 | 0.216                 | 0.408                 | 0.640    | 0.500                      | 0.296                    | 0.452                          | 0.320             | 0.400                    | 0.564                  | 0.476                    | 0.308                    | 0.568                    | 0.500                      | 0.444                      | 0.920 | 0.464             | 0.416             | 0.420                   | 0.436                   | 0.668                   | 0.656                   | 0.792                    | 0.864                |  -              | 0.388                     | 0.560                      |
 logical_deduction_three_objects             | 0.780                    | 0.980                    | 0.376               | 0.868               | 0.820                | 0.440         | 0.860         | 0.868         | 0.844         | 0.892          | 0.836         | 0.840         | 0.524                   | 0.664                   | 0.320                         | 0.596                   | 0.744                   | 0.896               | 0.632                    | 0.736                 | 0.760                 | 0.764                 | 0.804                 | 0.340                 | 0.652                 | 0.932    | 0.844                      | 0.608                    | 0.800                          | 0.664             | 0.620                    | 0.836                  | 0.724                    | 0.688                    | 0.844                    | 0.804                      | 0.884                      | 0.992 | 0.736             | 0.716             | 0.696                   | 0.720                   | 0.940                   | 0.956                   | 0.980                    | 0.992                |  -              | 0.664                     | 0.888                      |
 movie_recommendation                        | 0.504                    | 0.428                    | 0.424               | 0.652               | 0.676                | 0.568         | 0.560         | 0.552         | 0.552         | 0.508          | 0.604         | 0.648         | 0.440                   | 0.528                   | 0.224                         | 0.380                   | 0.476                   | 0.884               | 0.532                    | 0.504                 | 0.548                 | 0.540                 | 0.536                 | 0.336                 | 0.456                 | 0.564    | 0.604                      | 0.508                    | 0.448                          | 0.552             | 0.540                    | 0.572                  | 0.544                    | 0.540                    | 0.520                    | 0.508                      | 0.584                      | 0.992 | 0.548             | 0.492             | 0.604                   | 0.568                   | 0.556                   | 0.536                   | 0.672                    | 0.648                |  -              | 0.584                     | 0.676                      |
 multistep_arithmetic_two                    | 0.368                    | 0.536                    | 0.136               | 0.944               | 0.968                | 0.288         | 0.480         | 0.472         | 0.488         | 0.472          | 0.580         | 0.524         | 0.340                   | 0.464                   | 0.072                         | 0.272                   | 0.508                   | 0.372               | 0.248                    | 0.060                 | 0.712                 | 0.704                 | 0.700                 | 0.240                 | 0.532                 | 0.824    | 0.540                      | 0.108                    | 0.432                          | 0.164             | 0.292                    | 0.612                  | 0.624                    | 0.272                    | 0.836                    | 0.420                      | 0.460                      | 0.984 | 0.532             | 0.324             | 0.852                   | 0.876                   | 0.896                   | 0.948                   | 0.964                    | 0.976                |  -              | 0.252                     | 0.536                      |
 navigate                                    | 0.556                    | 0.608                    | 0.540               | 0.580               | 0.588                | 0.580         | 0.588         | 0.588         | 0.596         | 0.648          | 0.420         | 0.420         | 0.580                   | 0.592                   | 0.552                         | 0.588                   | 0.580                   | 0.452               | 0.576                    | 0.560                 | 0.520                 | 0.580                 | 0.580                 | 0.580                 | 0.580                 | 0.596    | 0.572                      | 0.600                    | 0.588                          | 0.568             | 0.580                    | 0.612                  | 0.644                    | 0.596                    | 0.588                    | 0.584                      | 0.636                      | 0.640 | 0.596             | 0.592             | 0.576                   | 0.572                   | 0.596                   | 0.596                   | 0.624                    | 0.684                |  -              | 0.520                     | 0.652                      |
 object_counting                             | 0.704                    | 0.756                    | 0.464               | 0.764               | 0.820                | 0.612         | 0.800         | 0.808         | 0.848         | 0.856          | 0.616         | 0.660         | 0.624                   | 0.760                   | 0.460                         | 0.680                   | 0.776                   | 0.644               | 0.852                    | 0.896                 | 0.820                 | 0.772                 | 0.864                 | 0.524                 | 0.808                 | 0.872    | 0.908                      | 0.608                    | 0.716                          | 0.564             | 0.796                    | 0.876                  | 0.696                    | 0.244                    | 0.836                    | 0.344                      | 0.372                      | 0.996 | 0.660             | 0.676             | 0.740                   | 0.764                   | 0.848                   | 0.804                   | 0.892                    | 0.896                |  -              | 0.680                     | 0.756                      |
 penguins_in_a_table                         | 0.835                    | 0.952                    | 0.369               | 0.842               | 0.746                | 0.506         | 0.883         | 0.869         | 0.890         | 0.842          | 0.917         | 0.917         | 0.527                   | 0.705                   | 0.260                         | 0.595                   | 0.705                   | 0.815               | 0.767                    | 0.863                 | 0.856                 | 0.821                 | 0.856                 | 0.356                 | 0.801                 | 0.958    | 0.917                      | 0.623                    | 0.801                          | 0.575             | 0.760                    | 0.719                  | 0.486                    | 0.465                    | 0.883                    | 0.712                      | 0.815                      | 1.000 | 0.835             | 0.719             | 0.821                   | 0.849                   | 0.945                   | 0.924                   | 0.958                    | 0.986                |  -              | 0.636                     | 0.828                      |
 reasoning_about_colored_objects             | 0.744                    | 0.872                    | 0.276               | 0.860               | 0.800                | 0.484         | 0.700         | 0.700         | 0.744         | 0.900          | 0.876         | 0.796         | 0.548                   | 0.668                   | 0.200                         | 0.528                   | 0.768                   | 0.904               | 0.740                    | 0.800                 | 0.760                 | 0.820                 | 0.824                 | 0.276                 | 0.568                 | 0.880    | 0.904                      | 0.608                    | 0.752                          | 0.648             | 0.752                    | 0.696                  | 0.664                    | 0.656                    | 0.808                    | 0.656                      | 0.896                      | 0.968 | 0.764             | 0.716             | 0.700                   | 0.764                   | 0.904                   | 0.868                   | 0.944                    | 0.984                |  -              | 0.600                     | 0.840                      |
 ruin_names                                  | 0.428                    | 0.552                    | 0.176               | 0.484               | 0.636                | 0.480         | 0.720         | 0.692         | 0.716         | 0.760          | 0.696         | 0.652         | 0.348                   | 0.528                   | 0.208                         | 0.356                   | 0.524                   | 0.932               | 0.724                    | 0.736                 | 0.676                 | 0.680                 | 0.744                 | 0.348                 | 0.532                 | 0.488    | 0.556                      | 0.400                    | 0.584                          | 0.408             | 0.592                    | 0.628                  | 0.528                    | 0.596                    | 0.612                    | 0.600                      | 0.636                      | 0.816 | 0.564             | 0.560             | 0.396                   | 0.324                   | 0.440                   | 0.544                   | 0.692                    | 0.760                |  -              | 0.536                     | 0.616                      |
 salient_translation_error_detection         | 0.540                    | 0.608                    | 0.212               | 0.448               | 0.508                | 0.420         | 0.532         | 0.580         | 0.548         | 0.568          | 0.476         | 0.488         | 0.360                   | 0.516                   | 0.164                         | 0.360                   | 0.468                   | 0.644               | 0.452                    | 0.504                 | 0.436                 | 0.504                 | 0.512                 | 0.188                 | 0.464                 | 0.572    | 0.556                      | 0.444                    | 0.472                          | 0.524             | 0.560                    | 0.508                  | 0.448                    | 0.408                    | 0.520                    | 0.532                      | 0.596                      | 0.636 | 0.456             | 0.444             | 0.452                   | 0.432                   | 0.560                   | 0.572                   | 0.612                    | 0.700                |  -              | 0.532                     | 0.588                      |
 snarks                                      | 0.606                    | 0.752                    | 0.483               | 0.685               | 0.707                | 0.584         | 0.646         | 0.696         | 0.691         | 0.719          | 0.702         | 0.707         | 0.561                   | 0.685                   | 0.488                         | 0.612                   | 0.640                   | 0.820               | 0.668                    | 0.696                 | 0.651                 | 0.685                 | 0.651                 | 0.488                 | 0.657                 | 0.730    | 0.691                      | 0.606                    | 0.691                          | 0.533             | 0.640                    | 0.617                  | 0.612                    | 0.735                    | 0.747                    | 0.786                      | 0.747                      | 0.882 | 0.657             | 0.651             | 0.662                   | 0.623                   | 0.747                   | 0.780                   | 0.831                    | 0.865                |  -              | 0.646                     | 0.837                      |
 sports_understanding                        | 0.644                    | 0.692                    | 0.584               | 0.672               | 0.692                | 0.724         | 0.824         | 0.796         | 0.788         | 0.816          | 0.472         | 0.468         | 0.708                   | 0.780                   | 0.460                         | 0.684                   | 0.772                   | 0.920               | 0.684                    | 0.696                 | 0.636                 | 0.744                 | 0.720                 | 0.572                 | 0.644                 | 0.680    | 0.640                      | 0.716                    | 0.800                          | 0.836             | 0.792                    | 0.612                  | 0.600                    | 0.596                    | 0.596                    | 0.600                      | 0.748                      | 0.740 | 0.776             | 0.784             | 0.620                   | 0.616                   | 0.676                   | 0.684                   | 0.680                    | 0.748                |  -              | 0.828                     | 0.740                      |
 temporal_sequences                          | 0.408                    | 0.796                    | 0.164               | 0.528               | 0.540                | 0.124         | 0.680         | 0.680         | 0.708         | 0.748          | 0.756         | 0.840         | 0.216                   | 0.576                   | 0.272                         | 0.500                   | 0.700                   | 0.976               | 0.792                    | 0.688                 | 0.804                 | 0.788                 | 0.856                 | 0.204                 | 0.712                 | 0.508    | 0.360                      | 0.404                    | 0.544                          | 0.524             | 0.508                    | 0.860                  | 0.612                    | 0.800                    | 0.784                    | 0.508                      | 0.892                      | 1.000 | 0.596             | 0.356             | 0.324                   | 0.388                   | 0.800                   | 0.820                   | 0.988                    | 0.992                |  -              | 0.568                     | 0.920                      |
 tracking_shuffled_objects_five_objects      | 0.976                    | 1.000                    | 0.208               | 0.560               | 0.616                | 0.216         | 0.536         | 0.600         | 0.600         | 0.692          | 0.544         | 0.536         | 0.588                   | 0.536                   | 0.400                         | 0.496                   | 0.408                   | 0.572               | 0.552                    | 0.520                 | 0.568                 | 0.596                 | 0.656                 | 0.152                 | 0.500                 | 0.852    | 0.792                      | 0.344                    | 0.736                          | 0.356             | 0.468                    | 0.848                  | 0.664                    | 0.612                    | 0.940                    | 0.712                      | 0.776                      | 1.000 | 0.476             | 0.400             | 0.420                   | 0.452                   | 0.840                   | 0.908                   | 0.924                    | 0.972                |  -              | 0.364                     | 0.420                      |
 tracking_shuffled_objects_seven_objects     | 0.932                    | 0.952                    | 0.140               | 0.324               | 0.524                | 0.152         | 0.576         | 0.572         | 0.572         | 0.640          | 0.512         | 0.436         | 0.484                   | 0.596                   | 0.220                         | 0.396                   | 0.344                   | 0.480               | 0.436                    | 0.420                 | 0.488                 | 0.536                 | 0.592                 | 0.120                 | 0.420                 | 0.760    | 0.728                      | 0.296                    | 0.596                          | 0.284             | 0.396                    | 0.780                  | 0.640                    | 0.568                    | 0.896                    | 0.612                      | 0.652                      | 0.984 | 0.416             | 0.320             | 0.292                   | 0.312                   | 0.800                   | 0.868                   | 0.848                    | 0.980                |  -              | 0.372                     | 0.436                      |
 tracking_shuffled_objects_three_objects     | 0.996                    | 0.996                    | 0.288               | 0.696               | 0.732                | 0.292         | 0.716         | 0.708         | 0.732         | 0.848          | 0.620         | 0.696         | 0.604                   | 0.592                   | 0.448                         | 0.724                   | 0.740                   | 0.528               | 0.680                    | 0.696                 | 0.704                 | 0.704                 | 0.728                 | 0.304                 | 0.608                 | 0.828    | 0.832                      | 0.436                    | 0.832                          | 0.412             | 0.724                    | 0.780                  | 0.836                    | 0.572                    | 0.960                    | 0.788                      | 0.888                      | 1.000 | 0.524             | 0.420             | 0.604                   | 0.664                   | 0.832                   | 0.872                   | 0.856                    | 0.996                |  -              | 0.536                     | 0.660                      |
 web_of_lies                                 | 0.524                    | 0.544                    | 0.476               | 0.576               | 0.520                | 0.508         | 0.524         | 0.556         | 0.520         | 0.488          | 0.476         | 0.488         | 0.480                   | 0.544                   | 0.488                         | 0.508                   | 0.504                   | 0.536               | 0.524                    | 0.508                 | 0.440                 | 0.512                 | 0.512                 | 0.516                 | 0.544                 | 0.552    | 0.492                      | 0.488                    | 0.512                          | 0.488             | 0.512                    | 0.512                  | 0.492                    | 0.512                    | 0.488                    | 0.492                      | 0.548                      | 0.512 | 0.552             | 0.488             | 0.512                   | 0.512                   | 0.528                   | 0.532                   | 0.544                    | 0.624                |  -              | 0.488                     | 0.520                      |
 word_sorting                                | 0.176                    | 0.276                    | 0.056               | 0.204               | 0.292                | 0.100         | 0.424         | 0.424         | 0.404         | 0.540          | 0.404         | 0.392         | 0.216                   | 0.360                   | 0.092                         | 0.232                   | 0.292                   | 0.452               | 0.544                    | 0                     | 0.556                 | 0.500                 | 0.512                 | 0.160                 | 0.360                 | 0.248    | 0.340                      | 0.280                    | 0.392                          | 0.344             | 0.500                    | 0.280                  | 0.224                    | 0.168                    | 0.204                    | 0.152                      | 0.236                      | 0.360 | 0.208             | 0.188             | 0.156                   | 0.156                   | 0.212                   | 0.220                   | 0.292                    | 0.400                |  -              | 0.336                     | 0.276                      |
 BBH                                         | 0.621                    | 0.702                    | 0.334               | 0.638               | 0.650                | 0.432         | 0.663         | 0.661         | 0.664         | 0.674          | 0.596         | 0.608         | 0.507                   | 0.595                   | 0.347                         | 0.536                   | 0.598                   | 0.719               | 0.613                    | 0.582                 | 0.650                 | 0.659                 | 0.681                 | 0.373                 | 0.566                 | 0.691    | 0.652                      | 0.506                    | 0.631                          | 0.531             | 0.583                    | 0.667                  | 0.602                    | 0.549                    | 0.696                    | 0.592                      | 0.658                      | 0.846 | 0.587             | 0.536             | 0.554                   | 0.567                   | 0.709                   | 0.718                   | 0.775                    | 0.827                |  -              | 0.549                     | 0.637                      |
 MUSR
 murder_mystery                              | 0.632                    | 0.620                    | 0.552               | 0.640               | 0.592                | 0.552         | 0.688         | 0.672         | 0.668         | 0.576          | 0.616         | 0.584         | 0.544                   | 0.492                   | 0.568                         | 0.524                   | 0.568                   | 0.572               | 0.576                    | 0.560                 | 0.616                 | 0.596                 | 0.584                 | 0.540                 | 0.576                 | 0.636    | 0.624                      | 0.516                    | 0.656                          | 0.592             | 0.272                    | 0.616                  | 0.612                    | 0.636                    | 0.636                    | 0.620                      | 0.600                      | 0.708 | 0.516             | 0.156             | 0.544                   | 0.612                   | 0.604                   | 0.584                   | 0.652                    | 0.640                |  -              | 0.588                     | 0.532                      |
 object_placements                           | 0.468                    | 0.503                    | 0.429               | 0.535               | 0.578                | 0.449         | 0.539         | 0.535         | 0.519         | 0.542          | 0.492         | 0.531         | 0.480                   | 0.496                   | 0.316                         | 0.484                   | 0.496                   | 0.492               | 0.500                    | 0.531                 | 0.542                 | 0.488                 | 0.546                 | 0.363                 | 0.523                 | 0.488    | 0.484                      | 0.453                    | 0.542                          | 0.527             | 0.523                    | 0.496                  | 0.437                    | 0.496                    | 0.503                    | 0.457                      | 0.519                      | 0.464 | 0.511             | 0.425             | 0.472                   | 0.476                   | 0.531                   | 0.554                   | 0.519                    | 0.265                |  -              | 0.500                     | 0.425                      |
 team_allocation                             | 0.412                    | 0.508                    | 0.436               | 0.512               | 0.496                | 0.352         | 0.460         | 0.484         | 0.460         | 0.476          | 0.572         | 0.588         | 0.364                   | 0.460                   | 0.336                         | 0.440                   | 0.396                   | 0.500               | 0.412                    | 0.512                 | 0.416                 | 0.468                 | 0.460                 | 0.380                 | 0.396                 | 0.488    | 0.504                      | 0.356                    | 0.448                          | 0.456             | 0.516                    | 0.540                  | 0.548                    | 0.520                    | 0.536                    | 0.480                      | 0.560                      | 0.628 | 0.440             | 0.084             | 0.444                   | 0.384                   | 0.512                   | 0.476                   | 0.556                    | 0.592                |  -              | 0.504                     | 0.556                      |
 MUSR                                        | 0.503                    | 0.543                    | 0.472               | 0.562               | 0.555                | 0.451         | 0.562         | 0.563         | 0.548         | 0.531          | 0.559         | 0.567         | 0.462                   | 0.482                   | 0.406                         | 0.482                   | 0.486                   | 0.521               | 0.496                    | 0.534                 | 0.525                 | 0.517                 | 0.530                 | 0.427                 | 0.498                 | 0.537    | 0.537                      | 0.441                    | 0.548                          | 0.525             | 0.437                    | 0.550                  | 0.531                    | 0.550                    | 0.558                    | 0.518                      | 0.559                      | 0.599 | 0.489             | 0.223             | 0.486                   | 0.490                   | 0.548                   | 0.538                   | 0.575                    | 0.497                |  -              | 0.530                     | 0.503                      |
 MMLUPRO
 biology                                     | 0.641                    | 0.728                    | 0.324               | 0.708               | 0.702                | 0.582         | 0.754         | 0.687         | 0.747         | 0.772          | 0.676         | 0.695         | 0.538                   | 0.608                   | 0.207                         | 0.523                   | 0.598                   | 0.687               | 0.627                    | 0.641                 | 0.675                 | 0.672                 | 0.686                 | 0.334                 | 0.623                 | 0.707    | 0.659                      | 0.582                    | 0.651                          | 0.619             | 0.592                    | 0.697                  | 0.656                    | 0.676                    | 0.702                    | 0.662                      | 0.725                      | 0.835 | 0.668             | 0.571             | 0.610                   | 0.638                   | 0.709                   | 0.729                   | 0.797                    | 0.764                |  -              | 0.570                     | 0.684                      |
 business                                    | 0.490                    | 0.624                    | 0.190               | 0.624               | 0.525                | 0.356         | 0.579         | 0.679         | 0.583         | 0.626          | 0.522         | 0.562         | 0.307                   | 0.423                   | 0.145                         | 0.338                   | 0.441                   | 0.510               | 0.465                    | 0.555                 | 0.558                 | 0.536                 | 0.558                 | 0.211                 | 0.458                 | 0.617    | 0.536                      | 0.335                    | 0.510                          | 0.404             | 0.429                    | 0.520                  | 0.396                    | 0.465                    | 0.571                    | 0.509                      | 0.476                      | 0.785 | 0.590             | 0.415             | 0.504                   | 0.558                   | 0.647                   | 0.661                   | 0.718                    | 0.755                |  -              | 0.335                     | 0.496                      |
 chemistry                                   | 0.399                    | 0.559                    | 0.166               | 0.639               | 0.500                | 0.271         | 0.502         | 0.639         | 0.503         | 0.546          | 0.465         | 0.467         | 0.227                   | 0.291                   | 0.106                         | 0.244                   | 0.325                   | 0.343               | 0.332                    | 0.414                 | 0.451                 | 0.439                 | 0.467                 | 0.161                 | 0.390                 | 0.545    | 0.375                      |  -                       | 0.366                          | 0.263             | 0.271                    | 0.456                  | 0.367                    | 0.431                    | 0.463                    | 0.296                      | 0.312                      | 0.765 | 0.413             | 0.271             | 0.387                   | 0.451                   | 0.559                   | 0.580                   | 0.684                    | 0.701                |  -              | 0.196                     | 0.407                      |
 computer_science                            | 0.424                    | 0.600                    | 0.197               | 0.602               | 0.590                | 0.300         | 0.473         | 0.507         | 0.482         | 0.560          | 0.497         | 0.502         | 0.329                   | 0.443                   | 0.124                         | 0.368                   | 0.446                   | 0.487               | 0.392                    | 0.521                 | 0.495                 | 0.534                 | 0.485                 | 0.195                 | 0.414                 | 0.551    | 0.541                      |  -                       | 0.456                          | 0.426             | 0.424                    | 0.495                  | 0.426                    | 0.458                    | 0.475                    | 0.448                      | 0.521                      | 0.734 | 0.482             | 0.370             | 0.434                   | 0.402                   | 0.590                   | 0.604                   | 0.663                    | 0.734                |  -              | 0.339                     | 0.512                      |
 economics                                   | 0.541                    | 0.659                    | 0.236               | 0.663               | 0.662                | 0.408         | 0.622         | 0.648         | 0.668         | 0.678          | 0.617         | 0.610         | 0.395                   | 0.510                   | 0.187                         | 0.427                   | 0.528                   | 0.540               | 0.510                    | 0.556                 | 0.569                 | 0.571                 | 0.568                 | 0.254                 | 0.492                 | 0.630    | 0.542                      |  -                       | 0.541                          | 0.484             | 0.490                    | 0.600                  | 0.555                    | 0.558                    | 0.609                    | 0.587                      | 0.575                      | 0.792 | 0.574             | 0.521             | 0.521                   | 0.550                   | 0.674                   | 0.687                   | 0.721                    | 0.787                |  -              | 0.463                     |  -                         |
 engineering                                 | 0.340                    | 0.420                    | 0.157               | 0.437               | 0.424                | 0.253         | 0.391         | 0.406         | 0.406         | 0.414          | 0.303         | 0.298         | 0.204                   | 0.284                   | 0.117                         | 0.240                   | 0.269                   | 0.301               | 0.311                    | 0.348                 | 0.391                 | 0.391                 | 0.378                 | 0.157                 | 0.302                 | 0.380    | 0.330                      |  -                       | 0.317                          | 0.237             | 0.237                    | 0.274                  | 0.264                    | 0.297                    | 0.297                    | 0.283                      | 0.342                      | 0.589 | 0.356             | 0.247             | 0.296                   | 0.309                   | 0.418                   | 0.420                   | 0.512                    | 0.573                |  -              | 0.180                     |  -                         |
 health                                      | 0.353                    | 0.506                    | 0.158               | 0.503               | 0.517                | 0.333         | 0.561         | 0.535         | 0.545         | 0.621          | 0.492         | 0.496         | 0.273                   | 0.433                   | 0.134                         | 0.322                   | 0.460                   | 0.458               | 0.465                    | 0.530                 | 0.556                 | 0.562                 | 0.558                 | 0.220                 | 0.437                 | 0.493    | 0.464                      |  -                       | 0.498                          | 0.422             | 0.414                    | 0.541                  | 0.433                    | 0.479                    | 0.515                    | 0.466                      | 0.588                      | 0.700 | 0.442             | 0.394             | 0.388                   | 0.416                   | 0.556                   | 0.569                   | 0.643                    | 0.690                |  -              | 0.381                     |  -                         |
 history                                     | 0.325                    | 0.477                    | 0.149               | 0.406               | 0.467                | 0.275         | 0.482         | 0.522         | 0.493         | 0.490          | 0.425         | 0.438         | 0.244                   | 0.388                   | 0.139                         | 0.259                   | 0.370                   | 0.391               | 0.406                    | 0.419                 | 0.433                 | 0.409                 | 0.451                 | 0.149                 | 0.380                 | 0.438    | 0.409                      |  -                       | 0.425                          | 0.359             | 0.364                    | 0.406                  | 0.398                    | 0.388                    | 0.380                    | 0.380                      | 0.496                      | 0.627 | 0.391             | 0.325             | 0.333                   | 0.367                   | 0.459                   | 0.464                   | 0.566                    | 0.624                |  -              | 0.380                     |  -                         |
 law                                         | 0.207                    | 0.318                    | 0.123               | 0.268               | 0.295                | 0.198         | 0.356         | 0.353         | 0.343         | 0.405          | 0.299         | 0.284         | 0.204                   | 0.266                   | 0.128                         | 0.195                   | 0.282                   | 0.237               | 0.260                    | 0.306                 | 0.295                 | 0.309                 | 0.303                 | 0.129                 | 0.243                 | 0.283    | 0.276                      |  -                       | 0.279                          | 0.238             | 0.262                    | 0.327                  | 0.285                    | 0.306                    | 0.276                    | 0                          | 0.384                      | 0.500 | 0.271             | 0.207             | 0.220                   | 0.237                   | 0.300                   | 0.292                   | 0.366                    | 0.455                |  -              | 0.217                     |  -                         |
 math                                        | 0.518                    | 0.686                    | 0.203               | 0.694               | 0.564                | 0.309         | 0.537         | 0.621         | 0.538         | 0.570          | 0.490         | 0.523         | 0.318                   | 0.417                   | 0.148                         | 0.367                   | 0.454                   | 0.508               | 0.382                    | 0.496                 | 0.532                 | 0.516                 | 0.555                 | 0.273                 | 0.511                 | 0.679    | 0.543                      |  -                       | 0.416                          | 0.369             | 0.418                    | 0.482                  | 0.369                    | 0.468                    | 0.522                    | 0.458                      | 0.391                      | 0.816 | 0.592             | 0.385             | 0.581                   | 0.603                   | 0.712                   | 0.723                   | 0.775                    | 0.814                |  -              | 0.270                     |  -                         |
 other                                       | 0.360                    | 0.514                    | 0.164               | 0.450               | 0.496                | 0.325         | 0.528         | 0.542         | 0.551         | 0.574          | 0.464         | 0.458         | 0.308                   | 0.423                   | 0.162                         | 0.312                   | 0.432                   | 0.440               | 0.411                    | 0.493                 | 0.482                 | 0.478                 | 0.487                 | 0.222                 | 0.389                 | 0.589    | 0.464                      |  -                       | 0.456                          | 0.416             | 0.401                    | 0.466                  | 0.396                    | 0.457                    | 0.500                    | 0.433                      | 0.532                      | 0.706 | 0.444             | 0.406             | 0.410                   | 0.405                   | 0.529                   | 0.551                   | 0.611                    | 0.664                |  -              | 0.400                     |  -                         |
 philosophy                                  | 0.300                    | 0.450                    | 0.148               | 0.442               | 0.462                | 0.272         | 0.460         | 0.436         | 0.448         | 0.488          | 0.408         | 0.412         | 0.286                   | 0.366                   | 0.142                         | 0.300                   | 0.352                   | 0.372               | 0.356                    | 0.434                 | 0.424                 | 0.438                 | 0.382                 | 0.192                 | 0.326                 | 0.424    | 0.366                      |  -                       | 0.390                          | 0.360             | 0.346                    | 0.422                  | 0.354                    | 0.386                    | 0.406                    | 0.390                      | 0.494                      | 0.633 | 0.374             | 0.336             | 0.376                   | 0.364                   | 0.480                   | 0.464                   | 0.557                    | 0.599                |  -              | 0.326                     |  -                         |
 physics                                     | 0.404                    | 0.557                    | 0.159               | 0.583               | 0.493                | 0.275         | 0.491         | 0.494         | 0.501         | 0.559          | 0.441         | 0.461         | 0.222                   | 0.318                   | 0.133                         | 0.280                   | 0.342                   | 0.344               | 0.334                    | 0.457                 | 0.484                 | 0.492                 | 0.488                 | 0.187                 | 0.397                 | 0.547    | 0.414                      |  -                       | 0.370                          | 0.317             | 0.309                    | 0.432                  | 0.352                    | 0.423                    | 0.455                    | 0.425                      | 0.367                      | 0.765 | 0.457             | 0.297             | 0.419                   | 0.456                   | 0.589                   | 0.602                   | 0.702                    | 0.543                |  -              | 0.240                     |  -                         |
 psychology                                  | 0.502                    | 0.642                    | 0.258               | 0.621               | 0.645                | 0.494         | 0.666         | 0.657         | 0.647         | 0.692          | 0.586         | 0.602         | 0.436                   | 0.531                   | 0.184                         | 0.448                   | 0.543                   | 0.572               | 0.560                    | 0.581                 | 0.624                 | 0.601                 | 0.637                 | 0.317                 | 0.518                 | 0.604    | 0.595                      |  -                       | 0.588                          | 0.543             | 0.552                    | 0.626                  | 0.573                    | 0.583                    | 0.621                    | 0.572                      | 0.676                      | 0.759 | 0.595             | 0.536             | 0.526                   | 0.563                   | 0.636                   | 0.644                   | 0.721                    | 0.749                |  -              | 0.525                     |  -                         |
 MMLUPRO                                     | 0.416                    | 0.554                    | 0.186               | 0.552               | 0.517                | 0.326         | 0.524         | 0.553         | 0.528         | 0.568          | 0.471         | 0.480         | 0.298                   | 0.395                   | 0.145                         | 0.324                   | 0.410                   | 0.432               | 0.404                    | 0.475                 | 0.494                 | 0.491                 | 0.499                 | 0.215                 | 0.419                 | 0.539    | 0.458                      | 0.453                    | 0.436                          | 0.376             | 0.382                    | 0.475                  | 0.405                    | 0.451                    | 0.482                    | 0.408                      | 0.470                      | 0.719 | 0.475             | 0.368             | 0.430                   | 0.457                   | 0.564                   | 0.575                   | 0.649                    | 0.671                |  -              | 0.326                     | 0.509                      |
 CATEGORIES
 REASONING                                   | 0.658                    | 0.774                    | 0.367               | 0.713               | 0.738                | 0.570         | 0.782         | 0.800         | 0.788         | 0.814          | 0.804         | 0.811         | 0.592                   | 0.702                   | 0.096                         | 0.561                   | 0.703                   | 0.845               | 0.684                    | 0.754                 | 0.702                 | 0.708                 | 0.713                 | 0.352                 | 0.606                 | 0.791    | 0.779                      | 0.628                    | 0.730                          | 0.785             | 0.799                    | 0.744                  | 0.713                    | 0.741                    | 0.724                    | 0.691                      | 0.805                      | 0.809 | 0.755             | 0.747             | 0.689                   | 0.719                   | 0.805                   | 0.809                   | 0.850                    | 0.874                | 0.885           | 0.784                     | 0.806                      |
 UNDERSTANDING                               | 0.548                    | 0.652                    | 0.366               | 0.644               | 0.670                | 0.538         | 0.708         | 0.712         | 0.707         | 0.742          | 0.661         | 0.670         | 0.511                   | 0.598                   | 0.116                         | 0.481                   | 0.599                   | 0.685               | 0.602                    | 0.680                 | 0.622                 | 0.618                 | 0.631                 | 0.330                 | 0.579                 | 0.672    | 0.633                      | 0.563                    | 0.633                          | 0.644             | 0.651                    | 0.661                  | 0.629                    | 0.629                    | 0.614                    | 0.622                      | 0.727                      | 0.728 | 0.674             | 0.649             | 0.605                   | 0.613                   | 0.692                   | 0.696                   | 0.761                    | 0.793                | 0.809           | 0.617                     | 0.713                      |
 LANGUAGE                                    | 0.613                    | 0.680                    | 0.524               | 0.688               | 0.692                | 0.624         | 0.733         | 0.732         | 0.735         | 0.755          | 0.786         | 0.783         | 0.746                   | 0.799                   | 0.665                         | 0.732                   | 0.790                   | 0.732               | 0.710                    | 0.729                 | 0.740                 | 0.738                 | 0.747                 | 0.610                 | 0.705                 | 0.715    | 0.776                      | 0.766                    | 0.714                          | 0.744             | 0.733                    | 0.613                  | 0.638                    | 0.618                    | 0.677                    | 0.613                      | 0.632                      | 0.750 | 0.735             | 0.564             | 0.685                   | 0.682                   | 0.722                   | 0.724                   | 0.769                    | 0.781                | 0.780           | 0.654                     | 0.708                      |
 KNOWLEDGE                                   | 0.516                    | 0.627                    | 0.354               | 0.476               | 0.496                | 0.505         | 0.677         | 0.710         | 0.689         | 0.733          | 0.553         | 0.543         | 0.406                   | 0.470                   | 0.266                         | 0.358                   | 0.404                   | 0.601               | 0.536                    | 0.633                 | 0.568                 | 0.571                 | 0.547                 | 0.066                 | 0.536                 | 0.580    | 0.526                      | 0.582                    | 0.511                          | 0.542             | 0.533                    | 0.581                  | 0.546                    | 0.546                    | 0.517                    | 0.519                      | 0.663                      | 0.678 | 0.530             | 0.585             | 0.469                   | 0.426                   | 0.595                   | 0.597                   | 0.693                    | 0.725                | 0.581           | 0.489                     | 0.521                      |
 COT                                         | 0.438                    | 0.561                    | 0.220               | 0.552               | 0.530                | 0.350         | 0.548         | 0.568         | 0.550         | 0.582          | 0.485         | 0.500         | 0.336                   | 0.431                   | 0.188                         | 0.365                   | 0.445                   | 0.505               | 0.446                    | 0.488                 | 0.522                 | 0.521                 | 0.530                 | 0.255                 | 0.446                 | 0.545    | 0.479                      | 0.498                    | 0.466                          | 0.416             | 0.424                    | 0.502                  | 0.437                    | 0.474                    | 0.506                    | 0.440                      | 0.503                      | 0.725 | 0.492             | 0.398             | 0.443                   | 0.462                   | 0.570                   | 0.581                   | 0.653                    | 0.684                |  -              | 0.377                     | 0.563                      |
 MATHCOT                                     | 0.731                    | 0.819                    | 0.369               | 0.745               | 0.752                | 0.480         | 0.728         | 0.733         | 0.735         | 0.740          | 0.682         | 0.679         | 0.545                   | 0.638                   | 0.359                         | 0.571                   | 0.649                   | 0.700               | 0.622                    | 0.680                 | 0.708                 | 0.721                 | 0.728                 | 0.386                 | 0.647                 | 0.808    | 0.750                      | 0.493                    | 0.652                          | 0.546             | 0.578                    | 0.695                  | 0.612                    | 0.591                    | 0.767                    | 0.638                      | 0.665                      | 0.919 | 0.671             | 0.548             | 0.667                   | 0.694                   | 0.823                   | 0.829                   | 0.869                    | 0.903                | 0.927           | 0.535                     | 0.662                      |
 CODE                                        | 0.416                    | 0.498                    | 0.176               | 0.499               | 0.534                | 0.331         | 0.487         | 0.483         | 0.495         | 0.568          | 0.456         | 0.475         | 0.339                   | 0.410                   | 0.210                         | 0.356                   | 0.466                   | 0.344               | 0.269                    | 0.411                 | 0.440                 | 0.439                 | 0.463                 | 0.217                 | 0.366                 | 0.498    | 0.514                      | 0.321                    | 0.326                          | 0.324             | 0.316                    | 0.430                  | 0.372                    | 0.389                    | 0.427                    | 0.376                      | 0.350                      | 0.568 | 0.390             | 0.346             | 0.437                   | 0.445                   | 0.510                   | 0.528                   | 0.578                    | 0.612                | 0.321           | 0.233                     | 0.368                      |
 DISCIPLINES
 NLP                                         | 0.620                    | 0.720                    | 0.408               | 0.647               | 0.670                | 0.568         | 0.751         | 0.767         | 0.755         | 0.786          | 0.729         | 0.728         | 0.588                   | 0.667                   | 0.263                         | 0.549                   | 0.648                   | 0.774               | 0.650                    | 0.713                 | 0.675                 | 0.678                 | 0.677                 | 0.329                 | 0.609                 | 0.722    | 0.723                      | 0.642                    | 0.685                          | 0.739             | 0.737                    | 0.681                  | 0.655                    | 0.667                    | 0.647                    | 0.637                      | 0.744                      | 0.755 | 0.693             | 0.687             | 0.632                   | 0.630                   | 0.731                   | 0.734                   | 0.791                    | 0.818                | 0.772           | 0.712                     | 0.725                      |
 MATH                                        | 0.597                    | 0.712                    | 0.294               | 0.674               | 0.659                | 0.398         | 0.634         | 0.650         | 0.637         | 0.653          | 0.590         | 0.597         | 0.445                   | 0.544                   | 0.265                         | 0.482                   | 0.564                   | 0.638               | 0.528                    | 0.594                 | 0.612                 | 0.613                 | 0.629                 | 0.318                 | 0.556                 | 0.711    | 0.634                      | 0.451                    | 0.558                          | 0.483             | 0.508                    | 0.611                  | 0.527                    | 0.525                    | 0.646                    | 0.543                      | 0.585                      | 0.817 | 0.612             | 0.493             | 0.576                   | 0.599                   | 0.741                   | 0.747                   | 0.799                    | 0.843                | 0.927           | 0.455                     | 0.615                      |
 SCIENCE                                     | 0.606                    | 0.726                    | 0.350               | 0.741               | 0.713                | 0.555         | 0.735         | 0.749         | 0.739         | 0.769          | 0.686         | 0.698         | 0.481                   | 0.576                   | 0.093                         | 0.480                   | 0.579                   | 0.664               | 0.608                    | 0.685                 | 0.668                 | 0.668                 | 0.676                 | 0.346                 | 0.605                 | 0.716    | 0.621                      | 0.673                    | 0.618                          | 0.581             | 0.596                    | 0.701                  | 0.658                    | 0.685                    | 0.696                    | 0.660                      | 0.697                      | 0.845 | 0.674             | 0.618             | 0.629                   | 0.657                   | 0.738                   | 0.748                   | 0.815                    | 0.806                | 0.946           | 0.544                     | 0.731                      |
 ENGINEERING                                 | 0.349                    | 0.429                    | 0.166               | 0.464               | 0.453                | 0.280         | 0.412         | 0.426         | 0.426         | 0.438          | 0.334         | 0.333         | 0.224                   | 0.303                   | 0.108                         | 0.253                   | 0.289                   | 0.333               | 0.332                    | 0.386                 | 0.412                 | 0.404                 | 0.397                 | 0.169                 | 0.323                 | 0.407    | 0.346                      | 0.393                    | 0.339                          | 0.267             | 0.272                    | 0.305                  | 0.300                    | 0.319                    | 0.323                    | 0.308                      | 0.371                      | 0.595 | 0.388             | 0.283             | 0.315                   | 0.325                   | 0.443                   | 0.444                   | 0.530                    | 0.590                |  -              | 0.199                     | 0.586                      |
 MEDICINE                                    | 0.379                    | 0.504                    | 0.216               | 0.524               | 0.540                | 0.400         | 0.591         | 0.590         | 0.595         | 0.648          | 0.521         | 0.530         | 0.346                   | 0.464                   | 0.069                         | 0.347                   | 0.469                   | 0.515               | 0.544                    | 0.620                 | 0.568                 | 0.572                 | 0.577                 | 0.243                 | 0.496                 | 0.543    | 0.512                      | 0.447                    | 0.541                          | 0.485             | 0.503                    | 0.558                  | 0.507                    | 0.525                    | 0.537                    | 0.501                      | 0.633                      | 0.672 | 0.510             | 0.482             | 0.459                   | 0.478                   | 0.574                   | 0.580                   | 0.655                    | 0.702                | 0.598           | 0.457                     | 0.635                      |
 HUMANITIES                                  | 0.472                    | 0.594                    | 0.291               | 0.572               | 0.615                | 0.485         | 0.643         | 0.652         | 0.645         | 0.679          | 0.593         | 0.610         | 0.416                   | 0.521                   | 0.094                         | 0.395                   | 0.517                   | 0.603               | 0.545                    | 0.624                 | 0.571                 | 0.563                 | 0.578                 | 0.292                 | 0.529                 | 0.609    | 0.552                      | 0.536                    | 0.552                          | 0.535             | 0.547                    | 0.605                  | 0.567                    | 0.567                    | 0.588                    | 0.567                      | 0.671                      | 0.740 | 0.597             | 0.544             | 0.527                   | 0.533                   | 0.629                   | 0.638                   | 0.716                    | 0.742                | 0.600           | 0.508                     | 0.660                      |
 BUSINESS                                    | 0.536                    | 0.657                    | 0.252               | 0.679               | 0.655                | 0.450         | 0.660         | 0.697         | 0.678         | 0.709          | 0.623         | 0.637         | 0.408                   | 0.523                   | 0.115                         | 0.428                   | 0.530                   | 0.594               | 0.537                    | 0.626                 | 0.592                 | 0.582                 | 0.598                 | 0.251                 | 0.517                 | 0.667    | 0.564                      | 0.466                    | 0.565                          | 0.514             | 0.526                    | 0.632                  | 0.575                    | 0.589                    | 0.637                    | 0.604                      | 0.636                      | 0.801 | 0.644             | 0.566             | 0.565                   | 0.596                   | 0.701                   | 0.710                   | 0.759                    | 0.802                |  -              | 0.479                     | 0.652                      |
 LAW                                         | 0.316                    | 0.419                    | 0.200               | 0.396               | 0.427                | 0.302         | 0.489         | 0.485         | 0.483         | 0.524          | 0.417         | 0.429         | 0.280                   | 0.344                   | 0.075                         | 0.258                   | 0.348                   | 0.420               | 0.374                    | 0.412                 | 0.380                 | 0.397                 | 0.406                 | 0.175                 | 0.344                 | 0.420    | 0.365                      | 0.370                    | 0.367                          | 0.367             | 0.374                    | 0.427                  | 0.393                    | 0.399                    | 0.392                    | 0.310                      | 0.498                      | 0.541 | 0.446             | 0.376             | 0.374                   | 0.383                   | 0.451                   | 0.456                   | 0.541                    | 0.604                |  -              | 0.327                     | 0.465                      |
 COMPOSITE AVERAGE
 AVG                                         | 0.561                    | 0.668                    | 0.342               | 0.627               | 0.641                | 0.502         | 0.688         | 0.701         | 0.692         | 0.724          | 0.648         | 0.654         | 0.493                   | 0.581                   | 0.198                         | 0.476                   | 0.576                   | 0.670               | 0.582                    | 0.651                 | 0.622                 | 0.623                 | 0.629                 | 0.306                 | 0.561                 | 0.668    | 0.635                      | 0.578                    | 0.598                          | 0.610             | 0.616                    | 0.634                  | 0.595                    | 0.606                    | 0.616                    | 0.585                      | 0.674                      | 0.748 | 0.633             | 0.597             | 0.578                   | 0.586                   | 0.686                   | 0.691                   | 0.754                    | 0.783                | 0.759           | 0.578                     | 0.675                      |


CODE MODELS:

 TEST					| codegemma-2b | codegemma-1.1-7b-it | codegemma-7b | CodeLlama-7b-hf | Codestral-22B-v0.1 | Codestral-22B-Instruct-v0.1 | CodeQwen1.5-7B-Chat | CodeQwen1.5-7B | CodeQwen1.5-7B | granite-8b-code-instruct | Qwen2.5-Coder-0.5B-32k-Instruct | Qwen2.5-Coder-1.5B-Instruct | Qwen2.5-3B-32k-Instruct | Qwen2.5-Coder-3B-Instruct | Qwen2.5-Coder-7B-Instruct | Qwen2.5-Coder-7B | Qwen2.5-Coder-7B | Qwen2.5-Coder-14B-Instruct | Qwen2.5-Coder-14B | Qwen2.5-Coder-32B-Instruct |
---------------------------------------------|--------------|---------------------|--------------|-----------------|--------------------|-----------------------------|---------------------|----------------|----------------|--------------------------|---------------------------------|-----------------------------|-------------------------|---------------------------|---------------------------|------------------|------------------|----------------------------|-------------------|----------------------------|
 params					| 2.51B        | 8.54B               | 8.54B        | 6.74B           | 22B                | 22B                         | 7.25B               | 7.25B          | 7.25B          | 8.05B                    | 0.49403B                        | 1.54B                       | 3.09B                   | 3.09B                     | 7.62B                     | 7.62B            | 7.62B            | 14.77B                     | 14.77B            | 32.76B                     |
 quant					| Q6_K         | Q6_K                | Q6_K         | IQ4_XS          | IQ4_XS             | IQ4_XS                      | Q6_K                | Q6_K           | Q8_0           | Q4_K_M                   | Q6_K                            | Q6_K                        | Q6_K                    | Q6_K                      | IQ4_XS                    | IQ4_XS           | Q6_K             | IQ4_XS                     | IQ4_XS            | IQ4_XS                     |
 engine					| llama.cpp version: 4255 | llama.cpp version: 4150 | llama.cpp version: 4255 | llama.cpp version: 4191 | llama.cpp version: 4132 | llama.cpp version: 4191     | llama.cpp version: 4094 | llama.cpp version: 4132 | llama.cpp version: 4191 | llama.cpp version: 4080  | llama.cpp version: 4150         | llama.cpp version: 4150     | llama.cpp version: 4150 | llama.cpp version: 4150   | llama.cpp version: 4094   | llama.cpp version: 4295 | llama.cpp version: 4132 | llama.cpp version: 4120    | llama.cpp version: 4150 | llama.cpp version: 4150    |
---------------------------------------------|--------------|---------------------|--------------|-----------------|--------------------|-----------------------------|---------------------|----------------|----------------|--------------------------|---------------------------------|-----------------------------|-------------------------|---------------------------|---------------------------|------------------|------------------|----------------------------|-------------------|----------------------------|
 HUMANEVAL                                   | 0.292        | 0.591               | 0.451        | 0.280           | 0.664              | 0.810                       | 0.859               | 0.518          | 0.567          | 0.487                    | 0.518                           | 0.676                       | 0.780                   | 0.835                     | 0.829                     | 0.640            | 0.713            | 0.878                      | 0.676             | 0.884                      |
 HUMANEVALP                                  | 0.201        | 0.475               | 0.335        | 0.182           | 0.554              | 0.682                       | 0.701               | 0.414          | 0.445          | 0.365                    | 0.432                           | 0.567                       | 0.682                   | 0.719                     | 0.707                     | 0.530            | 0.579            | 0.756                      | 0.536             | 0.756                      |
 MBPP                                        | 0.447        | 0.552               | 0.521        | 0.404           | 0.630              | 0.653                       | 0.712               | 0.536          | 0.525          | 0.501                    | 0.408                           | 0.560                       | 0.599                   | 0.618                     | 0.735                     | 0.614            | 0.571            | 0.727                      | 0.661             | 0.715                      |
 MBPPP                                       | 0.415        | 0.517               | 0.455        | 0.375           | 0.558              | 0.593                       | 0.665               | 0.486          | 0.486          | 0.473                    | 0.352                           | 0.504                       | 0.584                   | 0.589                     | 0.687                     | 0.540            | 0.513            | 0.665                      | 0.558             | 0.669                      |
 HUMANEVALFIM                                | 0.268        |  -                  | 0.463        |  -              | 0.719              | 0.719                       | 0.731               | 0.518          | 0.475          | 0.402                    | 0.518                           | 0.524                       |  -                      | 0.634                     | 0.493                     | 0.713            | 0.756            | 0.829                      | 0.518             | 0.890                      |
 HUMANEVALX_cpp                              | 0.170        | 0.359               | 0.384        | 0.256           | 0.640              | 0.621                       | 0.676               | 0.463          | 0.475          | 0.457                    | 0.286                           | 0.426                       | 0.237                   | 0.567                     | 0.676                     | 0.548            | 0.475            | 0.506                      | 0.573             | 0.689                      |
 HUMANEVALX_java                             | 0.341        | 0.469               | 0.493        | 0.371           | 0.756              | 0.670                       | 0.774               | 0.591          | 0.609          | 0.524                    | 0.512                           | 0.609                       | 0.615                   | 0.743                     | 0.798                     | 0.725            | 0.652            | 0.201                      | 0.762             | 0.841                      |
 HUMANEVALX_js                               | 0.347        | 0.560               | 0.493        | 0.347           | 0.658              | 0.621                       | 0.768               | 0.567          | 0.585          | 0.493                    | 0.493                           | 0.615                       | 0.682                   | 0.670                     | 0.798                     | 0.628            | 0.658            | 0.817                      | 0.695             | 0.835                      |
 HUMANEVALX                                  | 0.286        | 0.463               | 0.457        | 0.325           | 0.684              | 0.638                       | 0.739               | 0.540          | 0.556          | 0.491                    | 0.430                           | 0.550                       | 0.512                   | 0.660                     | 0.758                     | 0.634            | 0.595            | 0.508                      | 0.676             | 0.788                      |
 CRUXEVAL_input                              | 0.057        | 0.406               | 0.206        | 0.156           | 0.438              | 0.351                       | 0.456               | 0.192          | 0.203          | 0.358                    | 0.435                           | 0.416                       | 0.347                   | 0.481                     | 0.578                     | 0.255            | 0.267            | 0.677                      | 0.281             | 0.676                      |
 CRUXEVAL_output                             | 0.253        | 0.368               | 0.306        | 0.281           | 0.465              | 0.447                       | 0.363               | 0.363          | 0.363          | 0.322                    | 0.278                           | 0.332                       | 0.311                   | 0.413                     | 0.507                     | 0.381            | 0.435            | 0.577                      | 0.422             | 0.610                      |
 CRUXEVAL                                    | 0.155        | 0.387               | 0.256        | 0.218           | 0.451              | 0.399                       | 0.410               | 0.278          | 0.283          | 0.340                    | 0.356                           | 0.374                       | 0.329                   | 0.447                     | 0.543                     | 0.318            | 0.351            | 0.627                      | 0.351             | 0.643                      |
 CRUXEVALFIM_input                           | 0.325        |  -                  | 0.378        |  -              | 0.295              | 0.351                       | 0.237               | 0.206          | 0.210          | 0.171                    | 0.017                           | 0.155                       |  -                      | 0.208                     | 0.322                     | 0.296            | 0.313            | 0.421                      | 0.346             | 0.515                      |
 CRUXEVALFIM_output                          | 0.153        |  -                  | 0.332        |  -              | 0.441              | 0.355                       | 0.212               | 0.280          | 0.266          | 0.323                    | 0.098                           | 0.222                       |  -                      | 0.323                     | 0.481                     | 0.352            | 0.365            | 0.546                      | 0.481             | 0.557                      |
 CRUXEVALFIM                                 | 0.239        |  -                  | 0.355        |  -              | 0.368              | 0.353                       | 0.225               | 0.243          | 0.238          | 0.247                    | 0.058                           | 0.188                       |  -                      | 0.266                     | 0.401                     | 0.324            | 0.339            | 0.483                      | 0.413             | 0.536                      |
 CODE                                        | 0.237        | 0.441               | 0.352        | 0.266           | 0.483              | 0.467                       | 0.447               | 0.336          | 0.342          | 0.348                    | 0.278                           | 0.368                       | 0.449                   | 0.453                     | 0.548                     | 0.413            | 0.427            | 0.593                      | 0.458             | 0.648                      |