File size: 12,594 Bytes
94d6d71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
common_init_from_params: setting dry_penalty_last_n to ctx_size = 768
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

system_info: n_threads = 6 (n_threads_batch = 6) / 12 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | AARCH64_REPACK = 1 |
multiple_choice_score: there are 869 tasks in prompt
multiple_choice_score: selecting 750 random tasks from 869 tasks available
multiple_choice_score: preparing task data...done
multiple_choice_score : calculating TruthfulQA score over 750 tasks.

task	acc_norm
1	100.00000000
2	50.00000000
3	66.66666667
4	50.00000000
5	60.00000000
6	66.66666667
7	57.14285714
8	62.50000000
9	55.55555556
10	50.00000000
11	45.45454545
12	41.66666667
13	38.46153846
14	42.85714286
15	46.66666667
16	43.75000000
17	47.05882353
18	44.44444444
19	42.10526316
20	40.00000000
21	38.09523810
22	40.90909091
23	39.13043478
24	41.66666667
25	40.00000000
26	38.46153846
27	40.74074074
28	39.28571429
29	37.93103448
30	40.00000000
31	38.70967742
32	37.50000000
33	39.39393939
34	41.17647059
35	40.00000000
36	38.88888889
37	40.54054054
38	42.10526316
39	41.02564103
40	42.50000000
41	43.90243902
42	45.23809524
43	46.51162791
44	45.45454545
45	44.44444444
46	45.65217391
47	46.80851064
48	47.91666667
49	46.93877551
50	48.00000000
51	47.05882353
52	48.07692308
53	49.05660377
54	48.14814815
55	47.27272727
56	48.21428571
57	49.12280702
58	50.00000000
59	49.15254237
60	50.00000000
61	50.81967213
62	50.00000000
63	50.79365079
64	50.00000000
65	49.23076923
66	50.00000000
67	49.25373134
68	50.00000000
69	50.72463768
70	50.00000000
71	49.29577465
72	48.61111111
73	49.31506849
74	50.00000000
75	50.66666667
76	51.31578947
77	51.94805195
78	52.56410256
79	51.89873418
80	52.50000000
81	51.85185185
82	52.43902439
83	51.80722892
84	52.38095238
85	52.94117647
86	53.48837209
87	54.02298851
88	54.54545455
89	55.05617978
90	55.55555556
91	56.04395604
92	56.52173913
93	55.91397849
94	55.31914894
95	55.78947368
96	56.25000000
97	55.67010309
98	56.12244898
99	56.56565657
100	57.00000000
101	56.43564356
102	56.86274510
103	56.31067961
104	55.76923077
105	56.19047619
106	55.66037736
107	56.07476636
108	55.55555556
109	55.04587156
110	55.45454545
111	55.85585586
112	56.25000000
113	55.75221239
114	55.26315789
115	55.65217391
116	56.03448276
117	56.41025641
118	56.77966102
119	56.30252101
120	56.66666667
121	56.19834711
122	56.55737705
123	56.91056911
124	56.45161290
125	56.80000000
126	57.14285714
127	57.48031496
128	57.03125000
129	56.58914729
130	56.15384615
131	56.48854962
132	56.81818182
133	57.14285714
134	56.71641791
135	56.29629630
136	56.61764706
137	56.93430657
138	57.24637681
139	57.55395683
140	57.85714286
141	58.15602837
142	58.45070423
143	58.74125874
144	58.33333333
145	58.62068966
146	58.21917808
147	58.50340136
148	58.10810811
149	58.38926174
150	58.66666667
151	58.94039735
152	59.21052632
153	59.47712418
154	59.09090909
155	58.70967742
156	58.97435897
157	59.23566879
158	59.49367089
159	59.74842767
160	59.37500000
161	59.62732919
162	59.87654321
163	59.50920245
164	59.14634146
165	59.39393939
166	59.63855422
167	59.28143713
168	59.52380952
169	59.17159763
170	59.41176471
171	59.64912281
172	59.30232558
173	59.53757225
174	59.77011494
175	59.42857143
176	59.09090909
177	59.32203390
178	58.98876404
179	59.21787709
180	58.88888889
181	58.56353591
182	58.79120879
183	58.46994536
184	58.15217391
185	58.37837838
186	58.60215054
187	58.28877005
188	57.97872340
189	57.67195767
190	57.36842105
191	57.59162304
192	57.29166667
193	56.99481865
194	57.21649485
195	57.43589744
196	57.65306122
197	57.86802030
198	58.08080808
199	58.29145729
200	58.50000000
201	58.70646766
202	58.91089109
203	58.62068966
204	58.82352941
205	58.53658537
206	58.25242718
207	58.45410628
208	58.65384615
209	58.85167464
210	59.04761905
211	59.24170616
212	58.96226415
213	58.68544601
214	58.87850467
215	59.06976744
216	58.79629630
217	58.52534562
218	58.71559633
219	58.90410959
220	58.63636364
221	58.37104072
222	58.10810811
223	57.84753363
224	58.03571429
225	57.77777778
226	57.52212389
227	57.70925110
228	57.89473684
229	57.64192140
230	57.82608696
231	57.57575758
232	57.32758621
233	57.51072961
234	57.26495726
235	57.02127660
236	57.20338983
237	56.96202532
238	57.14285714
239	57.32217573
240	57.50000000
241	57.67634855
242	57.43801653
243	57.20164609
244	56.96721311
245	56.73469388
246	56.50406504
247	56.68016194
248	56.85483871
249	57.02811245
250	57.20000000
251	57.37051793
252	57.53968254
253	57.31225296
254	57.08661417
255	57.25490196
256	57.42187500
257	57.58754864
258	57.75193798
259	57.52895753
260	57.69230769
261	57.85440613
262	58.01526718
263	58.17490494
264	58.33333333
265	58.11320755
266	57.89473684
267	58.05243446
268	57.83582090
269	57.99256506
270	58.14814815
271	57.93357934
272	57.72058824
273	57.50915751
274	57.29927007
275	57.09090909
276	57.24637681
277	57.40072202
278	57.19424460
279	56.98924731
280	56.78571429
281	56.93950178
282	57.09219858
283	56.89045936
284	57.04225352
285	56.84210526
286	56.99300699
287	57.14285714
288	57.29166667
289	57.09342561
290	57.24137931
291	57.38831615
292	57.53424658
293	57.67918089
294	57.48299320
295	57.62711864
296	57.43243243
297	57.57575758
298	57.38255034
299	57.52508361
300	57.33333333
301	57.14285714
302	57.28476821
303	57.42574257
304	57.56578947
305	57.70491803
306	57.51633987
307	57.65472313
308	57.46753247
309	57.60517799
310	57.74193548
311	57.87781350
312	57.69230769
313	57.82747604
314	57.64331210
315	57.46031746
316	57.59493671
317	57.72870662
318	57.54716981
319	57.36677116
320	57.18750000
321	57.00934579
322	57.14285714
323	56.96594427
324	56.79012346
325	56.61538462
326	56.74846626
327	56.57492355
328	56.40243902
329	56.53495441
330	56.66666667
331	56.79758308
332	56.62650602
333	56.75675676
334	56.58682635
335	56.41791045
336	56.25000000
337	56.08308605
338	55.91715976
339	56.04719764
340	56.17647059
341	56.30498534
342	56.14035088
343	56.26822157
344	56.39534884
345	56.23188406
346	56.35838150
347	56.19596542
348	56.03448276
349	56.16045845
350	56.00000000
351	56.12535613
352	55.96590909
353	55.80736544
354	55.64971751
355	55.49295775
356	55.33707865
357	55.18207283
358	55.30726257
359	55.15320334
360	55.27777778
361	55.40166205
362	55.52486188
363	55.64738292
364	55.49450549
365	55.34246575
366	55.19125683
367	55.04087193
368	55.16304348
369	55.28455285
370	55.13513514
371	54.98652291
372	54.83870968
373	54.69168901
374	54.81283422
375	54.93333333
376	54.78723404
377	54.90716180
378	55.02645503
379	54.88126649
380	55.00000000
381	54.85564304
382	54.71204188
383	54.56919060
384	54.42708333
385	54.54545455
386	54.66321244
387	54.52196382
388	54.38144330
389	54.24164524
390	54.35897436
391	54.21994885
392	54.33673469
393	54.45292621
394	54.56852792
395	54.68354430
396	54.54545455
397	54.65994962
398	54.52261307
399	54.38596491
400	54.50000000
401	54.36408978
402	54.47761194
403	54.59057072
404	54.70297030
405	54.81481481
406	54.92610837
407	55.03685504
408	55.14705882
409	55.01222494
410	55.12195122
411	55.23114355
412	55.33980583
413	55.20581114
414	55.31400966
415	55.18072289
416	55.04807692
417	54.91606715
418	54.78468900
419	54.65393795
420	54.76190476
421	54.63182898
422	54.73933649
423	54.84633570
424	54.71698113
425	54.58823529
426	54.46009390
427	54.33255269
428	54.20560748
429	54.31235431
430	54.41860465
431	54.52436195
432	54.39814815
433	54.27251732
434	54.14746544
435	54.25287356
436	54.35779817
437	54.46224256
438	54.33789954
439	54.21412301
440	54.31818182
441	54.42176871
442	54.29864253
443	54.17607223
444	54.05405405
445	54.15730337
446	54.26008969
447	54.36241611
448	54.46428571
449	54.56570156
450	54.44444444
451	54.54545455
452	54.64601770
453	54.52538631
454	54.40528634
455	54.28571429
456	54.16666667
457	54.04814004
458	53.93013100
459	54.03050109
460	54.13043478
461	54.01301518
462	54.11255411
463	53.99568035
464	53.87931034
465	53.97849462
466	54.07725322
467	53.96145610
468	54.05982906
469	53.94456290
470	54.04255319
471	54.14012739
472	54.23728814
473	54.12262156
474	54.00843882
475	53.89473684
476	53.78151261
477	53.87840671
478	53.97489540
479	54.07098121
480	53.95833333
481	53.84615385
482	53.94190871
483	54.03726708
484	54.13223140
485	54.22680412
486	54.32098765
487	54.41478439
488	54.30327869
489	54.19222904
490	54.08163265
491	54.17515275
492	54.06504065
493	53.95537525
494	54.04858300
495	53.93939394
496	53.83064516
497	53.92354125
498	54.01606426
499	54.10821643
500	54.20000000
501	54.09181637
502	54.18326693
503	54.27435388
504	54.36507937
505	54.45544554
506	54.34782609
507	54.24063116
508	54.33070866
509	54.22396857
510	54.31372549
511	54.20743640
512	54.29687500
513	54.19103314
514	54.28015564
515	54.36893204
516	54.26356589
517	54.15860735
518	54.05405405
519	53.94990366
520	53.84615385
521	53.93474088
522	54.02298851
523	53.91969407
524	53.81679389
525	53.71428571
526	53.61216730
527	53.51043643
528	53.40909091
529	53.30812854
530	53.39622642
531	53.48399247
532	53.57142857
533	53.65853659
534	53.55805243
535	53.45794393
536	53.35820896
537	53.44506518
538	53.34572491
539	53.43228200
540	53.33333333
541	53.23475046
542	53.32103321
543	53.40699816
544	53.30882353
545	53.39449541
546	53.29670330
547	53.19926874
548	53.28467153
549	53.36976321
550	53.27272727
551	53.17604356
552	53.07971014
553	53.16455696
554	53.24909747
555	53.33333333
556	53.23741007
557	53.32136445
558	53.22580645
559	53.30948122
560	53.39285714
561	53.47593583
562	53.55871886
563	53.64120782
564	53.54609929
565	53.45132743
566	53.53356890
567	53.61552028
568	53.69718310
569	53.60281195
570	53.50877193
571	53.41506130
572	53.32167832
573	53.40314136
574	53.48432056
575	53.56521739
576	53.64583333
577	53.72616984
578	53.80622837
579	53.88601036
580	53.96551724
581	53.87263339
582	53.78006873
583	53.85934820
584	53.93835616
585	54.01709402
586	54.09556314
587	54.00340716
588	53.91156463
589	53.98981324
590	54.06779661
591	54.14551607
592	54.22297297
593	54.13153457
594	54.20875421
595	54.28571429
596	54.19463087
597	54.10385260
598	54.01337793
599	54.09015025
600	54.00000000
601	54.07653910
602	53.98671096
603	53.89718076
604	53.80794702
605	53.71900826
606	53.63036304
607	53.54200988
608	53.61842105
609	53.53037767
610	53.60655738
611	53.51882160
612	53.43137255
613	53.34420881
614	53.25732899
615	53.33333333
616	53.40909091
617	53.48460292
618	53.39805825
619	53.31179321
620	53.38709677
621	53.46215781
622	53.37620579
623	53.45104334
624	53.36538462
625	53.28000000
626	53.35463259
627	53.42902711
628	53.34394904
629	53.41812401
630	53.49206349
631	53.56576862
632	53.48101266
633	53.39652449
634	53.31230284
635	53.22834646
636	53.30188679
637	53.21821036
638	53.29153605
639	53.20813772
640	53.12500000
641	53.19812793
642	53.11526480
643	53.03265941
644	53.10559006
645	53.02325581
646	52.94117647
647	52.85935085
648	52.77777778
649	52.69645609
650	52.61538462
651	52.68817204
652	52.60736196
653	52.52679939
654	52.59938838
655	52.51908397
656	52.43902439
657	52.35920852
658	52.43161094
659	52.35204856
660	52.42424242
661	52.34493192
662	52.41691843
663	52.48868778
664	52.56024096
665	52.63157895
666	52.70270270
667	52.77361319
668	52.69461078
669	52.61584454
670	52.68656716
671	52.60804769
672	52.52976190
673	52.45170877
674	52.37388724
675	52.29629630
676	52.21893491
677	52.28951256
678	52.35988201
679	52.43004418
680	52.50000000
681	52.42290749
682	52.49266862
683	52.41581259
684	52.33918129
685	52.40875912
686	52.33236152
687	52.25618632
688	52.32558140
689	52.39477504
690	52.46376812
691	52.53256151
692	52.60115607
693	52.66955267
694	52.73775216
695	52.80575540
696	52.72988506
697	52.65423242
698	52.57879656
699	52.50357654
700	52.57142857
701	52.49643367
702	52.56410256
703	52.48933144
704	52.55681818
705	52.62411348
706	52.69121813
707	52.75813296
708	52.82485876
709	52.89139633
710	52.95774648
711	53.02390999
712	53.08988764
713	53.15568022
714	53.08123249
715	53.00699301
716	53.07262570
717	52.99860530
718	52.92479109
719	52.85118220
720	52.91666667
721	52.84327323
722	52.90858726
723	52.97372061
724	53.03867403
725	53.10344828
726	53.03030303
727	52.95735901
728	53.02197802
729	52.94924554
730	53.01369863
731	53.07797538
732	53.14207650
733	53.20600273
734	53.26975477
735	53.19727891
736	53.26086957
737	53.18860244
738	53.11653117
739	53.17997294
740	53.10810811
741	53.17139001
742	53.23450135
743	53.29744280
744	53.22580645
745	53.28859060
746	53.21715818
747	53.14591700
748	53.07486631
749	53.00400534
750	53.06666667

Final result: 53.0667 ±1.8235
Random chance: 25.0083 ±1.5824