File size: 12,594 Bytes
94d6d71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
common_init_from_params: setting dry_penalty_last_n to ctx_size = 768
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

system_info: n_threads = 6 (n_threads_batch = 6) / 12 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | AARCH64_REPACK = 1 |
multiple_choice_score: there are 869 tasks in prompt
multiple_choice_score: selecting 750 random tasks from 869 tasks available
multiple_choice_score: preparing task data...done
multiple_choice_score : calculating TruthfulQA score over 750 tasks.

task	acc_norm
1	100.00000000
2	50.00000000
3	66.66666667
4	50.00000000
5	60.00000000
6	66.66666667
7	57.14285714
8	62.50000000
9	55.55555556
10	50.00000000
11	45.45454545
12	50.00000000
13	46.15384615
14	50.00000000
15	53.33333333
16	50.00000000
17	52.94117647
18	50.00000000
19	47.36842105
20	45.00000000
21	42.85714286
22	45.45454545
23	43.47826087
24	45.83333333
25	44.00000000
26	42.30769231
27	44.44444444
28	42.85714286
29	41.37931034
30	43.33333333
31	41.93548387
32	43.75000000
33	45.45454545
34	47.05882353
35	45.71428571
36	44.44444444
37	45.94594595
38	44.73684211
39	43.58974359
40	45.00000000
41	46.34146341
42	45.23809524
43	46.51162791
44	45.45454545
45	44.44444444
46	45.65217391
47	46.80851064
48	47.91666667
49	46.93877551
50	48.00000000
51	47.05882353
52	48.07692308
53	49.05660377
54	48.14814815
55	47.27272727
56	46.42857143
57	47.36842105
58	48.27586207
59	47.45762712
60	48.33333333
61	49.18032787
62	48.38709677
63	49.20634921
64	50.00000000
65	49.23076923
66	50.00000000
67	50.74626866
68	51.47058824
69	52.17391304
70	51.42857143
71	50.70422535
72	50.00000000
73	50.68493151
74	51.35135135
75	52.00000000
76	52.63157895
77	53.24675325
78	53.84615385
79	53.16455696
80	52.50000000
81	51.85185185
82	52.43902439
83	51.80722892
84	52.38095238
85	52.94117647
86	53.48837209
87	54.02298851
88	54.54545455
89	55.05617978
90	55.55555556
91	56.04395604
92	56.52173913
93	55.91397849
94	55.31914894
95	55.78947368
96	56.25000000
97	55.67010309
98	56.12244898
99	56.56565657
100	56.00000000
101	55.44554455
102	55.88235294
103	55.33980583
104	54.80769231
105	55.23809524
106	54.71698113
107	55.14018692
108	54.62962963
109	54.12844037
110	54.54545455
111	54.95495495
112	55.35714286
113	54.86725664
114	54.38596491
115	54.78260870
116	54.31034483
117	54.70085470
118	55.08474576
119	54.62184874
120	55.00000000
121	54.54545455
122	54.91803279
123	55.28455285
124	54.83870968
125	55.20000000
126	55.55555556
127	55.90551181
128	55.46875000
129	55.03875969
130	54.61538462
131	54.96183206
132	55.30303030
133	55.63909774
134	55.22388060
135	54.81481481
136	54.41176471
137	54.74452555
138	55.07246377
139	54.67625899
140	55.00000000
141	55.31914894
142	55.63380282
143	55.94405594
144	55.55555556
145	55.86206897
146	55.47945205
147	55.78231293
148	55.40540541
149	55.70469799
150	56.00000000
151	55.62913907
152	55.92105263
153	56.20915033
154	55.84415584
155	55.48387097
156	55.76923077
157	56.05095541
158	56.32911392
159	56.60377358
160	56.25000000
161	55.90062112
162	56.17283951
163	55.82822086
164	55.48780488
165	55.75757576
166	56.02409639
167	55.68862275
168	55.95238095
169	55.62130178
170	55.88235294
171	56.14035088
172	55.81395349
173	56.06936416
174	56.32183908
175	56.00000000
176	55.68181818
177	55.93220339
178	55.61797753
179	55.30726257
180	55.00000000
181	54.69613260
182	54.94505495
183	54.64480874
184	54.89130435
185	55.13513514
186	55.37634409
187	55.08021390
188	54.78723404
189	54.49735450
190	54.21052632
191	54.45026178
192	54.16666667
193	53.88601036
194	54.12371134
195	54.35897436
196	54.59183673
197	54.82233503
198	55.05050505
199	55.27638191
200	55.50000000
201	55.72139303
202	55.94059406
203	55.66502463
204	55.88235294
205	55.60975610
206	55.33980583
207	55.55555556
208	55.76923077
209	55.98086124
210	56.19047619
211	56.39810427
212	56.13207547
213	55.86854460
214	56.07476636
215	56.27906977
216	56.01851852
217	55.76036866
218	55.96330275
219	56.16438356
220	55.90909091
221	55.65610860
222	55.40540541
223	55.15695067
224	55.35714286
225	55.11111111
226	54.86725664
227	55.06607930
228	55.26315789
229	55.45851528
230	55.65217391
231	55.41125541
232	55.17241379
233	55.36480687
234	55.12820513
235	54.89361702
236	55.08474576
237	54.85232068
238	55.04201681
239	55.23012552
240	55.41666667
241	55.60165975
242	55.37190083
243	55.14403292
244	54.91803279
245	54.69387755
246	54.47154472
247	54.65587045
248	54.43548387
249	54.21686747
250	54.40000000
251	54.58167331
252	54.76190476
253	54.54545455
254	54.33070866
255	54.50980392
256	54.68750000
257	54.86381323
258	55.03875969
259	54.82625483
260	55.00000000
261	55.17241379
262	55.34351145
263	55.51330798
264	55.68181818
265	55.84905660
266	55.63909774
267	55.80524345
268	55.59701493
269	55.76208178
270	55.92592593
271	55.71955720
272	55.51470588
273	55.67765568
274	55.47445255
275	55.27272727
276	55.43478261
277	55.59566787
278	55.39568345
279	55.19713262
280	55.00000000
281	55.16014235
282	55.31914894
283	55.12367491
284	55.28169014
285	55.08771930
286	55.24475524
287	55.05226481
288	55.20833333
289	55.01730104
290	55.17241379
291	55.32646048
292	55.13698630
293	55.29010239
294	55.10204082
295	55.25423729
296	55.40540541
297	55.55555556
298	55.36912752
299	55.51839465
300	55.33333333
301	55.14950166
302	55.29801325
303	55.11551155
304	55.26315789
305	55.40983607
306	55.22875817
307	55.37459283
308	55.51948052
309	55.66343042
310	55.80645161
311	55.94855305
312	55.76923077
313	55.91054313
314	56.05095541
315	55.87301587
316	56.01265823
317	56.15141956
318	55.97484277
319	55.79937304
320	55.62500000
321	55.45171340
322	55.59006211
323	55.41795666
324	55.55555556
325	55.38461538
326	55.52147239
327	55.35168196
328	55.18292683
329	55.01519757
330	55.15151515
331	55.28700906
332	55.12048193
333	55.25525526
334	55.08982036
335	54.92537313
336	54.76190476
337	54.59940653
338	54.43786982
339	54.57227139
340	54.70588235
341	54.83870968
342	54.67836257
343	54.81049563
344	54.94186047
345	54.78260870
346	54.91329480
347	54.75504323
348	54.59770115
349	54.72779370
350	54.57142857
351	54.70085470
352	54.54545455
353	54.39093484
354	54.23728814
355	54.08450704
356	53.93258427
357	53.78151261
358	53.91061453
359	53.76044568
360	53.88888889
361	54.01662050
362	54.14364641
363	54.26997245
364	54.12087912
365	53.97260274
366	53.82513661
367	53.67847411
368	53.80434783
369	53.92953930
370	53.78378378
371	53.63881402
372	53.49462366
373	53.35120643
374	53.47593583
375	53.60000000
376	53.72340426
377	53.84615385
378	53.96825397
379	53.82585752
380	53.94736842
381	53.80577428
382	53.66492147
383	53.52480418
384	53.38541667
385	53.50649351
386	53.62694301
387	53.48837209
388	53.35051546
389	53.21336761
390	53.33333333
391	53.19693095
392	53.31632653
393	53.18066158
394	53.29949239
395	53.41772152
396	53.28282828
397	53.40050378
398	53.26633166
399	53.13283208
400	53.25000000
401	53.11720698
402	53.23383085
403	53.34987593
404	53.46534653
405	53.58024691
406	53.69458128
407	53.80835381
408	53.92156863
409	53.78973105
410	53.90243902
411	54.01459854
412	54.12621359
413	53.99515738
414	54.10628019
415	53.97590361
416	53.84615385
417	53.71702638
418	53.58851675
419	53.46062053
420	53.57142857
421	53.44418052
422	53.55450237
423	53.66430260
424	53.53773585
425	53.41176471
426	53.28638498
427	53.16159251
428	53.03738318
429	53.14685315
430	53.25581395
431	53.36426914
432	53.24074074
433	53.11778291
434	52.99539171
435	53.10344828
436	53.21100917
437	53.31807780
438	53.19634703
439	53.30296128
440	53.40909091
441	53.28798186
442	53.16742081
443	53.04740406
444	52.92792793
445	53.03370787
446	53.13901345
447	53.24384787
448	53.34821429
449	53.45211581
450	53.33333333
451	53.43680710
452	53.53982301
453	53.64238411
454	53.52422907
455	53.40659341
456	53.28947368
457	53.17286652
458	53.05676856
459	53.15904139
460	53.04347826
461	52.92841649
462	53.03030303
463	52.91576674
464	52.80172414
465	52.90322581
466	53.00429185
467	52.89079229
468	52.99145299
469	52.87846482
470	52.97872340
471	53.07855626
472	53.17796610
473	53.06553911
474	52.95358650
475	52.84210526
476	52.73109244
477	52.83018868
478	52.92887029
479	53.02713987
480	52.91666667
481	52.80665281
482	52.90456432
483	53.00207039
484	53.09917355
485	53.19587629
486	53.29218107
487	53.38809035
488	53.48360656
489	53.37423313
490	53.26530612
491	53.36048880
492	53.25203252
493	53.14401623
494	53.23886640
495	53.13131313
496	53.02419355
497	53.11871227
498	53.21285141
499	53.30661323
500	53.40000000
501	53.49301397
502	53.58565737
503	53.67793241
504	53.76984127
505	53.86138614
506	53.75494071
507	53.64891519
508	53.54330709
509	53.63457760
510	53.72549020
511	53.62035225
512	53.71093750
513	53.60623782
514	53.69649805
515	53.78640777
516	53.68217054
517	53.57833656
518	53.47490347
519	53.37186898
520	53.26923077
521	53.35892514
522	53.44827586
523	53.34608031
524	53.24427481
525	53.14285714
526	53.04182510
527	52.94117647
528	52.84090909
529	52.74102079
530	52.83018868
531	52.91902072
532	53.00751880
533	53.09568480
534	52.99625468
535	52.89719626
536	52.79850746
537	52.88640596
538	52.78810409
539	52.87569573
540	52.77777778
541	52.68022181
542	52.76752768
543	52.85451197
544	52.75735294
545	52.84403670
546	52.74725275
547	52.65082267
548	52.73722628
549	52.82331512
550	52.72727273
551	52.63157895
552	52.53623188
553	52.62206148
554	52.52707581
555	52.61261261
556	52.51798561
557	52.60323160
558	52.50896057
559	52.59391771
560	52.67857143
561	52.76292335
562	52.84697509
563	52.93072824
564	52.83687943
565	52.74336283
566	52.82685512
567	52.91005291
568	52.99295775
569	52.89982425
570	52.80701754
571	52.88966725
572	52.79720280
573	52.70506108
574	52.78745645
575	52.86956522
576	52.95138889
577	53.03292894
578	53.11418685
579	53.19516408
580	53.27586207
581	53.18416523
582	53.09278351
583	53.17324185
584	53.25342466
585	53.33333333
586	53.41296928
587	53.32197615
588	53.23129252
589	53.31069610
590	53.38983051
591	53.46869712
592	53.54729730
593	53.45699831
594	53.53535354
595	53.61344538
596	53.52348993
597	53.43383585
598	53.34448161
599	53.42237062
600	53.33333333
601	53.41098170
602	53.32225914
603	53.23383085
604	53.14569536
605	53.05785124
606	52.97029703
607	52.88303130
608	52.96052632
609	52.87356322
610	52.95081967
611	52.86415712
612	52.77777778
613	52.69168026
614	52.60586319
615	52.68292683
616	52.75974026
617	52.83630470
618	52.75080906
619	52.66558966
620	52.74193548
621	52.65700483
622	52.57234727
623	52.64847512
624	52.56410256
625	52.48000000
626	52.39616613
627	52.47208931
628	52.54777070
629	52.62321145
630	52.53968254
631	52.45641838
632	52.37341772
633	52.29067930
634	52.20820189
635	52.12598425
636	52.20125786
637	52.11930926
638	52.19435737
639	52.11267606
640	52.03125000
641	52.10608424
642	52.02492212
643	51.94401244
644	52.01863354
645	51.93798450
646	51.85758514
647	51.77743431
648	51.69753086
649	51.61787365
650	51.53846154
651	51.61290323
652	51.53374233
653	51.45482389
654	51.52905199
655	51.45038168
656	51.52439024
657	51.44596651
658	51.51975684
659	51.44157815
660	51.51515152
661	51.43721634
662	51.51057402
663	51.58371041
664	51.65662651
665	51.72932331
666	51.80180180
667	51.87406297
668	51.79640719
669	51.71898356
670	51.79104478
671	51.71385991
672	51.63690476
673	51.56017831
674	51.48367953
675	51.40740741
676	51.33136095
677	51.40324963
678	51.47492625
679	51.54639175
680	51.61764706
681	51.54185022
682	51.46627566
683	51.39092240
684	51.31578947
685	51.38686131
686	51.31195335
687	51.23726346
688	51.30813953
689	51.37880987
690	51.44927536
691	51.51953690
692	51.58959538
693	51.65945166
694	51.72910663
695	51.79856115
696	51.72413793
697	51.79340029
698	51.71919771
699	51.64520744
700	51.71428571
701	51.78316690
702	51.85185185
703	51.77809388
704	51.84659091
705	51.91489362
706	51.84135977
707	51.90947666
708	51.97740113
709	52.04513399
710	52.11267606
711	52.18002813
712	52.24719101
713	52.31416550
714	52.24089636
715	52.16783217
716	52.23463687
717	52.16178522
718	52.08913649
719	52.01668985
720	51.94444444
721	51.87239945
722	51.80055402
723	51.86721992
724	51.93370166
725	52.00000000
726	51.92837466
727	51.85694635
728	51.92307692
729	51.98902606
730	52.05479452
731	52.12038304
732	52.18579235
733	52.25102319
734	52.31607629
735	52.24489796
736	52.30978261
737	52.23880597
738	52.16802168
739	52.23274696
740	52.16216216
741	52.22672065
742	52.29110512
743	52.35531629
744	52.28494624
745	52.34899329
746	52.27882038
747	52.20883534
748	52.13903743
749	52.06942590
750	52.13333333

Final result: 52.1333 ±1.8253
Random chance: 25.0083 ±1.5824