File size: 12,594 Bytes
94d6d71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
common_init_from_params: setting dry_penalty_last_n to ctx_size = 768
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

system_info: n_threads = 6 (n_threads_batch = 6) / 12 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | AARCH64_REPACK = 1 |
multiple_choice_score: there are 869 tasks in prompt
multiple_choice_score: selecting 750 random tasks from 869 tasks available
multiple_choice_score: preparing task data...done
multiple_choice_score : calculating TruthfulQA score over 750 tasks.

task	acc_norm
1	100.00000000
2	50.00000000
3	66.66666667
4	50.00000000
5	60.00000000
6	66.66666667
7	57.14285714
8	62.50000000
9	55.55555556
10	50.00000000
11	45.45454545
12	41.66666667
13	38.46153846
14	42.85714286
15	46.66666667
16	43.75000000
17	47.05882353
18	44.44444444
19	47.36842105
20	45.00000000
21	47.61904762
22	50.00000000
23	47.82608696
24	50.00000000
25	48.00000000
26	46.15384615
27	48.14814815
28	46.42857143
29	44.82758621
30	46.66666667
31	45.16129032
32	43.75000000
33	45.45454545
34	44.11764706
35	42.85714286
36	41.66666667
37	43.24324324
38	44.73684211
39	43.58974359
40	45.00000000
41	46.34146341
42	45.23809524
43	46.51162791
44	45.45454545
45	44.44444444
46	45.65217391
47	46.80851064
48	45.83333333
49	44.89795918
50	46.00000000
51	45.09803922
52	46.15384615
53	47.16981132
54	46.29629630
55	45.45454545
56	46.42857143
57	47.36842105
58	48.27586207
59	47.45762712
60	48.33333333
61	49.18032787
62	50.00000000
63	50.79365079
64	50.00000000
65	49.23076923
66	50.00000000
67	49.25373134
68	50.00000000
69	50.72463768
70	50.00000000
71	49.29577465
72	48.61111111
73	49.31506849
74	50.00000000
75	50.66666667
76	51.31578947
77	51.94805195
78	52.56410256
79	53.16455696
80	52.50000000
81	51.85185185
82	52.43902439
83	53.01204819
84	53.57142857
85	54.11764706
86	53.48837209
87	54.02298851
88	54.54545455
89	55.05617978
90	55.55555556
91	56.04395604
92	56.52173913
93	55.91397849
94	55.31914894
95	55.78947368
96	55.20833333
97	54.63917526
98	54.08163265
99	54.54545455
100	54.00000000
101	53.46534653
102	53.92156863
103	53.39805825
104	53.84615385
105	54.28571429
106	53.77358491
107	54.20560748
108	53.70370370
109	53.21100917
110	53.63636364
111	54.05405405
112	54.46428571
113	53.98230088
114	53.50877193
115	53.91304348
116	53.44827586
117	53.84615385
118	54.23728814
119	53.78151261
120	54.16666667
121	53.71900826
122	54.09836066
123	54.47154472
124	54.03225806
125	54.40000000
126	54.76190476
127	55.11811024
128	54.68750000
129	54.26356589
130	53.84615385
131	54.19847328
132	53.78787879
133	54.13533835
134	53.73134328
135	53.33333333
136	52.94117647
137	53.28467153
138	53.62318841
139	53.95683453
140	53.57142857
141	53.90070922
142	54.22535211
143	54.54545455
144	54.16666667
145	54.48275862
146	54.10958904
147	54.42176871
148	54.05405405
149	54.36241611
150	54.66666667
151	54.96688742
152	55.26315789
153	55.55555556
154	55.19480519
155	54.83870968
156	55.12820513
157	55.41401274
158	55.69620253
159	55.97484277
160	55.62500000
161	55.90062112
162	56.17283951
163	55.82822086
164	55.48780488
165	55.15151515
166	55.42168675
167	55.08982036
168	55.35714286
169	55.02958580
170	55.29411765
171	55.55555556
172	55.23255814
173	55.49132948
174	55.17241379
175	54.85714286
176	54.54545455
177	54.80225989
178	54.49438202
179	54.74860335
180	54.44444444
181	54.14364641
182	54.39560440
183	54.09836066
184	53.80434783
185	54.05405405
186	54.30107527
187	54.54545455
188	54.25531915
189	53.96825397
190	53.68421053
191	53.92670157
192	53.64583333
193	53.36787565
194	53.60824742
195	53.84615385
196	54.08163265
197	54.31472081
198	54.54545455
199	54.77386935
200	55.00000000
201	55.22388060
202	55.44554455
203	55.17241379
204	55.39215686
205	55.12195122
206	54.85436893
207	55.07246377
208	55.28846154
209	55.50239234
210	55.71428571
211	55.45023697
212	55.18867925
213	55.39906103
214	55.60747664
215	55.81395349
216	55.55555556
217	55.29953917
218	55.50458716
219	55.70776256
220	55.45454545
221	55.20361991
222	54.95495495
223	54.70852018
224	54.91071429
225	54.66666667
226	54.42477876
227	54.62555066
228	54.82456140
229	55.02183406
230	55.21739130
231	54.97835498
232	54.74137931
233	54.93562232
234	54.70085470
235	54.46808511
236	54.66101695
237	54.43037975
238	54.62184874
239	54.81171548
240	55.00000000
241	54.77178423
242	54.54545455
243	54.32098765
244	54.09836066
245	53.87755102
246	53.65853659
247	53.84615385
248	53.62903226
249	53.41365462
250	53.60000000
251	53.78486056
252	53.96825397
253	53.75494071
254	53.54330709
255	53.72549020
256	53.90625000
257	54.08560311
258	54.26356589
259	54.05405405
260	54.23076923
261	54.40613027
262	54.58015267
263	54.75285171
264	54.92424242
265	55.09433962
266	54.88721805
267	55.05617978
268	54.85074627
269	55.01858736
270	55.18518519
271	54.98154982
272	54.77941176
273	54.57875458
274	54.37956204
275	54.18181818
276	54.34782609
277	54.51263538
278	54.31654676
279	54.48028674
280	54.64285714
281	54.80427046
282	54.96453901
283	54.77031802
284	54.92957746
285	54.73684211
286	54.89510490
287	54.70383275
288	54.86111111
289	54.67128028
290	54.48275862
291	54.63917526
292	54.45205479
293	54.60750853
294	54.42176871
295	54.57627119
296	54.39189189
297	54.54545455
298	54.36241611
299	54.51505017
300	54.33333333
301	54.15282392
302	54.30463576
303	54.45544554
304	54.27631579
305	54.42622951
306	54.24836601
307	54.39739414
308	54.54545455
309	54.69255663
310	54.83870968
311	54.98392283
312	55.12820513
313	55.27156550
314	55.09554140
315	54.92063492
316	55.06329114
317	55.20504732
318	55.03144654
319	54.85893417
320	54.68750000
321	54.51713396
322	54.65838509
323	54.48916409
324	54.32098765
325	54.15384615
326	54.29447853
327	54.12844037
328	53.96341463
329	54.10334347
330	54.24242424
331	54.38066465
332	54.21686747
333	54.35435435
334	54.19161677
335	54.02985075
336	54.16666667
337	54.00593472
338	53.84615385
339	53.98230088
340	54.11764706
341	54.25219941
342	54.09356725
343	54.22740525
344	54.36046512
345	54.20289855
346	54.33526012
347	54.17867435
348	54.02298851
349	54.15472779
350	54.00000000
351	54.13105413
352	53.97727273
353	53.82436261
354	53.95480226
355	53.80281690
356	53.65168539
357	53.50140056
358	53.63128492
359	53.48189415
360	53.61111111
361	53.73961219
362	53.59116022
363	53.71900826
364	53.57142857
365	53.42465753
366	53.55191257
367	53.67847411
368	53.80434783
369	53.92953930
370	53.78378378
371	53.63881402
372	53.76344086
373	53.61930295
374	53.74331551
375	53.60000000
376	53.72340426
377	53.84615385
378	53.96825397
379	53.82585752
380	53.94736842
381	53.80577428
382	53.66492147
383	53.52480418
384	53.64583333
385	53.50649351
386	53.62694301
387	53.48837209
388	53.60824742
389	53.47043702
390	53.58974359
391	53.45268542
392	53.57142857
393	53.68956743
394	53.80710660
395	53.92405063
396	53.78787879
397	53.90428212
398	53.76884422
399	53.63408521
400	53.75000000
401	53.61596010
402	53.73134328
403	53.84615385
404	53.96039604
405	54.07407407
406	54.18719212
407	54.29975430
408	54.41176471
409	54.27872861
410	54.39024390
411	54.50121655
412	54.61165049
413	54.72154964
414	54.83091787
415	54.69879518
416	54.56730769
417	54.43645084
418	54.30622010
419	54.17661098
420	54.28571429
421	54.39429929
422	54.50236967
423	54.60992908
424	54.48113208
425	54.58823529
426	54.46009390
427	54.33255269
428	54.20560748
429	54.31235431
430	54.41860465
431	54.52436195
432	54.39814815
433	54.27251732
434	54.14746544
435	54.25287356
436	54.35779817
437	54.46224256
438	54.33789954
439	54.44191344
440	54.31818182
441	54.42176871
442	54.29864253
443	54.17607223
444	54.05405405
445	54.15730337
446	54.26008969
447	54.36241611
448	54.46428571
449	54.56570156
450	54.44444444
451	54.54545455
452	54.64601770
453	54.52538631
454	54.40528634
455	54.50549451
456	54.38596491
457	54.26695842
458	54.14847162
459	54.24836601
460	54.13043478
461	54.22993492
462	54.32900433
463	54.21166307
464	54.09482759
465	54.19354839
466	54.29184549
467	54.17558887
468	54.05982906
469	53.94456290
470	54.04255319
471	54.14012739
472	54.02542373
473	53.91120507
474	53.79746835
475	53.68421053
476	53.57142857
477	53.66876310
478	53.76569038
479	53.86221294
480	53.75000000
481	53.63825364
482	53.73443983
483	53.83022774
484	53.92561983
485	54.02061856
486	54.11522634
487	54.00410678
488	54.09836066
489	53.98773006
490	53.87755102
491	53.97148676
492	53.86178862
493	53.75253550
494	53.84615385
495	53.73737374
496	53.62903226
497	53.72233400
498	53.81526104
499	53.90781563
500	54.00000000
501	54.09181637
502	54.18326693
503	54.27435388
504	54.36507937
505	54.45544554
506	54.34782609
507	54.24063116
508	54.33070866
509	54.22396857
510	54.31372549
511	54.20743640
512	54.29687500
513	54.19103314
514	54.28015564
515	54.36893204
516	54.26356589
517	54.15860735
518	54.05405405
519	53.94990366
520	53.84615385
521	53.93474088
522	54.02298851
523	53.91969407
524	53.81679389
525	53.71428571
526	53.61216730
527	53.51043643
528	53.40909091
529	53.30812854
530	53.39622642
531	53.48399247
532	53.57142857
533	53.65853659
534	53.55805243
535	53.45794393
536	53.35820896
537	53.44506518
538	53.34572491
539	53.43228200
540	53.33333333
541	53.23475046
542	53.13653137
543	53.22283610
544	53.12500000
545	53.21100917
546	53.11355311
547	53.01645338
548	53.10218978
549	53.18761384
550	53.09090909
551	52.99455535
552	52.89855072
553	52.98372514
554	52.88808664
555	52.97297297
556	52.87769784
557	52.96229803
558	52.86738351
559	52.95169946
560	53.03571429
561	53.11942959
562	53.20284698
563	53.28596803
564	53.19148936
565	53.09734513
566	53.18021201
567	53.26278660
568	53.34507042
569	53.25131810
570	53.15789474
571	53.06479860
572	52.97202797
573	52.87958115
574	52.96167247
575	53.04347826
576	53.12500000
577	53.20623917
578	53.11418685
579	53.19516408
580	53.27586207
581	53.18416523
582	53.09278351
583	53.17324185
584	53.25342466
585	53.33333333
586	53.24232082
587	53.15161840
588	53.06122449
589	53.14091681
590	53.22033898
591	53.29949239
592	53.37837838
593	53.28836425
594	53.36700337
595	53.44537815
596	53.35570470
597	53.26633166
598	53.17725753
599	53.08848080
600	53.00000000
601	53.07820300
602	52.99003322
603	52.90215589
604	52.81456954
605	52.89256198
606	52.97029703
607	52.88303130
608	52.96052632
609	52.87356322
610	52.95081967
611	52.86415712
612	52.77777778
613	52.69168026
614	52.60586319
615	52.68292683
616	52.59740260
617	52.67423015
618	52.58899676
619	52.50403877
620	52.58064516
621	52.65700483
622	52.73311897
623	52.80898876
624	52.72435897
625	52.64000000
626	52.55591054
627	52.63157895
628	52.70700637
629	52.78219396
630	52.85714286
631	52.93185420
632	52.84810127
633	52.76461295
634	52.68138801
635	52.75590551
636	52.83018868
637	52.74725275
638	52.82131661
639	52.89514867
640	52.81250000
641	52.88611544
642	52.80373832
643	52.72161742
644	52.79503106
645	52.71317829
646	52.63157895
647	52.55023184
648	52.46913580
649	52.38828968
650	52.30769231
651	52.38095238
652	52.30061350
653	52.22052067
654	52.29357798
655	52.21374046
656	52.13414634
657	52.05479452
658	52.12765957
659	52.04855842
660	51.96969697
661	51.89107413
662	51.96374622
663	52.03619910
664	52.10843373
665	52.18045113
666	52.25225225
667	52.32383808
668	52.24550898
669	52.16741405
670	52.23880597
671	52.16095380
672	52.08333333
673	52.00594354
674	51.92878338
675	51.85185185
676	51.92307692
677	51.99409158
678	52.06489676
679	51.98821797
680	52.05882353
681	51.98237885
682	52.05278592
683	51.97657394
684	52.04678363
685	52.11678832
686	52.04081633
687	52.11062591
688	52.18023256
689	52.24963716
690	52.31884058
691	52.24312590
692	52.31213873
693	52.38095238
694	52.44956772
695	52.51798561
696	52.44252874
697	52.51076040
698	52.43553009
699	52.36051502
700	52.28571429
701	52.21112696
702	52.13675214
703	52.06258890
704	52.13068182
705	52.19858156
706	52.12464589
707	52.19236209
708	52.25988701
709	52.32722144
710	52.25352113
711	52.32067511
712	52.38764045
713	52.45441795
714	52.38095238
715	52.30769231
716	52.37430168
717	52.30125523
718	52.22841226
719	52.15577191
720	52.22222222
721	52.14979196
722	52.07756233
723	52.14384509
724	52.20994475
725	52.27586207
726	52.20385675
727	52.13204952
728	52.19780220
729	52.26337449
730	52.32876712
731	52.39398085
732	52.45901639
733	52.38744884
734	52.45231608
735	52.38095238
736	52.44565217
737	52.37449118
738	52.30352304
739	52.23274696
740	52.16216216
741	52.22672065
742	52.29110512
743	52.35531629
744	52.28494624
745	52.34899329
746	52.27882038
747	52.20883534
748	52.13903743
749	52.06942590
750	52.13333333

Final result: 52.1333 ±1.8253
Random chance: 25.0083 ±1.5824