File size: 12,594 Bytes
94d6d71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
common_init_from_params: setting dry_penalty_last_n to ctx_size = 768
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

system_info: n_threads = 6 (n_threads_batch = 6) / 12 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | AARCH64_REPACK = 1 |
multiple_choice_score: there are 869 tasks in prompt
multiple_choice_score: selecting 750 random tasks from 869 tasks available
multiple_choice_score: preparing task data...done
multiple_choice_score : calculating TruthfulQA score over 750 tasks.

task	acc_norm
1	100.00000000
2	50.00000000
3	66.66666667
4	50.00000000
5	60.00000000
6	66.66666667
7	71.42857143
8	75.00000000
9	66.66666667
10	60.00000000
11	54.54545455
12	50.00000000
13	46.15384615
14	50.00000000
15	53.33333333
16	50.00000000
17	52.94117647
18	50.00000000
19	47.36842105
20	45.00000000
21	47.61904762
22	50.00000000
23	47.82608696
24	50.00000000
25	48.00000000
26	46.15384615
27	48.14814815
28	46.42857143
29	44.82758621
30	46.66666667
31	45.16129032
32	43.75000000
33	42.42424242
34	41.17647059
35	40.00000000
36	38.88888889
37	37.83783784
38	39.47368421
39	38.46153846
40	40.00000000
41	41.46341463
42	42.85714286
43	44.18604651
44	43.18181818
45	42.22222222
46	43.47826087
47	44.68085106
48	43.75000000
49	42.85714286
50	44.00000000
51	43.13725490
52	44.23076923
53	45.28301887
54	46.29629630
55	45.45454545
56	46.42857143
57	47.36842105
58	48.27586207
59	47.45762712
60	48.33333333
61	49.18032787
62	48.38709677
63	49.20634921
64	48.43750000
65	47.69230769
66	48.48484848
67	47.76119403
68	47.05882353
69	47.82608696
70	47.14285714
71	46.47887324
72	45.83333333
73	46.57534247
74	47.29729730
75	48.00000000
76	48.68421053
77	49.35064935
78	50.00000000
79	49.36708861
80	48.75000000
81	48.14814815
82	48.78048780
83	48.19277108
84	48.80952381
85	49.41176471
86	48.83720930
87	49.42528736
88	50.00000000
89	50.56179775
90	51.11111111
91	51.64835165
92	51.08695652
93	50.53763441
94	50.00000000
95	50.52631579
96	50.00000000
97	49.48453608
98	48.97959184
99	49.49494949
100	49.00000000
101	48.51485149
102	49.01960784
103	48.54368932
104	48.07692308
105	48.57142857
106	48.11320755
107	48.59813084
108	48.14814815
109	47.70642202
110	47.27272727
111	47.74774775
112	48.21428571
113	48.67256637
114	48.24561404
115	48.69565217
116	48.27586207
117	47.86324786
118	48.30508475
119	47.89915966
120	48.33333333
121	47.93388430
122	48.36065574
123	48.78048780
124	48.38709677
125	48.80000000
126	49.20634921
127	49.60629921
128	49.21875000
129	48.83720930
130	48.46153846
131	48.09160305
132	47.72727273
133	48.12030075
134	47.76119403
135	47.40740741
136	47.05882353
137	47.44525547
138	47.10144928
139	47.48201439
140	47.14285714
141	47.51773050
142	47.88732394
143	48.25174825
144	47.91666667
145	48.27586207
146	47.94520548
147	48.29931973
148	47.97297297
149	48.32214765
150	48.66666667
151	48.34437086
152	48.68421053
153	49.01960784
154	48.70129870
155	48.38709677
156	48.71794872
157	49.04458599
158	49.36708861
159	49.68553459
160	49.37500000
161	49.06832298
162	49.38271605
163	49.07975460
164	48.78048780
165	48.48484848
166	48.79518072
167	48.50299401
168	48.80952381
169	48.52071006
170	48.82352941
171	49.12280702
172	49.41860465
173	49.71098266
174	50.00000000
175	49.71428571
176	49.43181818
177	49.71751412
178	49.43820225
179	49.72067039
180	49.44444444
181	49.17127072
182	49.45054945
183	49.18032787
184	48.91304348
185	49.18918919
186	49.46236559
187	49.73262032
188	49.46808511
189	49.20634921
190	48.94736842
191	49.21465969
192	48.95833333
193	48.70466321
194	48.96907216
195	49.23076923
196	48.97959184
197	48.73096447
198	48.98989899
199	49.24623116
200	49.50000000
201	49.75124378
202	50.00000000
203	49.75369458
204	50.00000000
205	49.75609756
206	49.51456311
207	49.75845411
208	50.00000000
209	49.76076555
210	50.00000000
211	50.23696682
212	50.00000000
213	49.76525822
214	50.00000000
215	50.23255814
216	50.00000000
217	49.76958525
218	50.00000000
219	50.22831050
220	50.00000000
221	49.77375566
222	49.54954955
223	49.32735426
224	49.55357143
225	49.33333333
226	49.11504425
227	49.33920705
228	49.56140351
229	49.34497817
230	49.56521739
231	49.35064935
232	49.13793103
233	49.35622318
234	49.14529915
235	48.93617021
236	48.72881356
237	48.52320675
238	48.73949580
239	48.95397490
240	49.16666667
241	48.96265560
242	48.76033058
243	48.55967078
244	48.36065574
245	48.57142857
246	48.37398374
247	48.58299595
248	48.38709677
249	48.59437751
250	48.80000000
251	49.00398406
252	49.20634921
253	49.40711462
254	49.21259843
255	49.41176471
256	49.60937500
257	49.80544747
258	50.00000000
259	49.80694981
260	49.61538462
261	49.80842912
262	50.00000000
263	50.19011407
264	50.37878788
265	50.56603774
266	50.37593985
267	50.56179775
268	50.37313433
269	50.55762082
270	50.74074074
271	50.92250923
272	50.73529412
273	50.54945055
274	50.36496350
275	50.18181818
276	50.36231884
277	50.54151625
278	50.35971223
279	50.17921147
280	50.35714286
281	50.53380783
282	50.70921986
283	50.53003534
284	50.70422535
285	50.52631579
286	50.69930070
287	50.52264808
288	50.69444444
289	50.86505190
290	50.68965517
291	50.85910653
292	50.68493151
293	50.85324232
294	50.68027211
295	50.84745763
296	50.67567568
297	50.84175084
298	50.67114094
299	50.83612040
300	50.66666667
301	50.49833887
302	50.66225166
303	50.82508251
304	50.65789474
305	50.81967213
306	50.65359477
307	50.81433225
308	50.97402597
309	51.13268608
310	51.29032258
311	51.44694534
312	51.28205128
313	51.43769968
314	51.27388535
315	51.11111111
316	51.26582278
317	51.41955836
318	51.25786164
319	51.09717868
320	50.93750000
321	50.77881620
322	50.93167702
323	50.77399381
324	50.61728395
325	50.46153846
326	50.61349693
327	50.45871560
328	50.30487805
329	50.15197568
330	50.30303030
331	50.45317221
332	50.30120482
333	50.45045045
334	50.29940120
335	50.14925373
336	50.00000000
337	49.85163205
338	49.70414201
339	49.85250737
340	50.00000000
341	50.14662757
342	50.00000000
343	50.14577259
344	50.29069767
345	50.14492754
346	50.28901734
347	50.14409222
348	50.00000000
349	50.14326648
350	50.00000000
351	50.14245014
352	50.28409091
353	50.14164306
354	50.00000000
355	49.85915493
356	49.71910112
357	49.57983193
358	49.72067039
359	49.58217270
360	49.72222222
361	49.86149584
362	49.72375691
363	49.86225895
364	49.72527473
365	49.58904110
366	49.45355191
367	49.59128065
368	49.45652174
369	49.59349593
370	49.45945946
371	49.32614555
372	49.46236559
373	49.32975871
374	49.46524064
375	49.33333333
376	49.46808511
377	49.60212202
378	49.73544974
379	49.60422164
380	49.73684211
381	49.60629921
382	49.47643979
383	49.34725849
384	49.21875000
385	49.09090909
386	49.22279793
387	49.09560724
388	49.22680412
389	49.10025707
390	49.23076923
391	49.10485934
392	49.23469388
393	49.36386768
394	49.49238579
395	49.62025316
396	49.49494949
397	49.62216625
398	49.49748744
399	49.37343358
400	49.50000000
401	49.37655860
402	49.50248756
403	49.62779156
404	49.75247525
405	49.87654321
406	50.00000000
407	50.12285012
408	50.24509804
409	50.12224939
410	50.24390244
411	50.36496350
412	50.24271845
413	50.12106538
414	50.24154589
415	50.12048193
416	50.00000000
417	49.88009592
418	49.76076555
419	49.64200477
420	49.76190476
421	49.64370546
422	49.76303318
423	49.88179669
424	49.76415094
425	49.88235294
426	49.76525822
427	49.64871194
428	49.53271028
429	49.65034965
430	49.53488372
431	49.65197216
432	49.53703704
433	49.42263279
434	49.30875576
435	49.42528736
436	49.31192661
437	49.42791762
438	49.31506849
439	49.43052392
440	49.31818182
441	49.43310658
442	49.32126697
443	49.20993228
444	49.09909910
445	49.21348315
446	49.32735426
447	49.44071588
448	49.55357143
449	49.66592428
450	49.55555556
451	49.44567627
452	49.33628319
453	49.22737307
454	49.11894273
455	49.23076923
456	49.12280702
457	49.01531729
458	48.90829694
459	49.01960784
460	48.91304348
461	49.02386117
462	49.13419913
463	49.02807775
464	48.92241379
465	49.03225806
466	49.14163090
467	49.25053533
468	49.14529915
469	49.04051173
470	49.14893617
471	49.25690021
472	49.15254237
473	49.04862579
474	48.94514768
475	48.84210526
476	48.73949580
477	48.84696017
478	48.95397490
479	49.06054280
480	48.95833333
481	48.85654886
482	48.96265560
483	49.06832298
484	49.17355372
485	49.27835052
486	49.38271605
487	49.28131417
488	49.18032787
489	49.07975460
490	48.97959184
491	49.08350305
492	48.98373984
493	48.88438134
494	48.98785425
495	48.88888889
496	48.79032258
497	48.89336016
498	48.99598394
499	49.09819639
500	49.20000000
501	49.30139721
502	49.40239044
503	49.50298211
504	49.60317460
505	49.70297030
506	49.60474308
507	49.50690335
508	49.60629921
509	49.50884086
510	49.60784314
511	49.51076321
512	49.60937500
513	49.51267057
514	49.61089494
515	49.70873786
516	49.61240310
517	49.51644101
518	49.61389961
519	49.51830443
520	49.42307692
521	49.52015355
522	49.42528736
523	49.33078394
524	49.23664122
525	49.14285714
526	49.04942966
527	48.95635674
528	48.86363636
529	48.77126654
530	48.86792453
531	48.96421846
532	49.06015038
533	49.15572233
534	49.06367041
535	48.97196262
536	48.88059701
537	48.97579143
538	48.88475836
539	48.97959184
540	48.88888889
541	48.79852126
542	48.70848708
543	48.80294659
544	48.71323529
545	48.80733945
546	48.71794872
547	48.62888483
548	48.72262774
549	48.81602914
550	48.72727273
551	48.63883848
552	48.55072464
553	48.64376130
554	48.55595668
555	48.64864865
556	48.56115108
557	48.47396768
558	48.38709677
559	48.47942755
560	48.57142857
561	48.66310160
562	48.75444840
563	48.84547069
564	48.75886525
565	48.67256637
566	48.76325088
567	48.85361552
568	48.94366197
569	48.85764499
570	48.77192982
571	48.68651489
572	48.60139860
573	48.51657941
574	48.60627178
575	48.69565217
576	48.78472222
577	48.87348354
578	48.78892734
579	48.87737478
580	48.96551724
581	48.88123924
582	48.79725086
583	48.71355060
584	48.80136986
585	48.88888889
586	48.80546075
587	48.72231687
588	48.63945578
589	48.72665535
590	48.81355932
591	48.90016920
592	48.98648649
593	48.90387858
594	48.98989899
595	49.07563025
596	48.99328859
597	48.91122278
598	48.82943144
599	48.91485810
600	48.83333333
601	48.91846922
602	48.83720930
603	48.75621891
604	48.67549669
605	48.59504132
606	48.67986799
607	48.76441516
608	48.84868421
609	48.76847291
610	48.85245902
611	48.77250409
612	48.69281046
613	48.61337684
614	48.53420195
615	48.61788618
616	48.53896104
617	48.62236629
618	48.54368932
619	48.46526656
620	48.54838710
621	48.63123994
622	48.71382637
623	48.79614767
624	48.71794872
625	48.64000000
626	48.56230032
627	48.64433812
628	48.72611465
629	48.80763116
630	48.88888889
631	48.96988906
632	48.89240506
633	48.81516588
634	48.73817035
635	48.81889764
636	48.89937107
637	48.82260597
638	48.90282132
639	48.98278560
640	48.90625000
641	48.98595944
642	48.90965732
643	48.83359253
644	48.91304348
645	48.83720930
646	48.76160991
647	48.68624420
648	48.61111111
649	48.53620955
650	48.46153846
651	48.54070661
652	48.46625767
653	48.39203675
654	48.47094801
655	48.39694656
656	48.47560976
657	48.40182648
658	48.48024316
659	48.40667678
660	48.33333333
661	48.26021180
662	48.33836858
663	48.41628959
664	48.49397590
665	48.57142857
666	48.64864865
667	48.72563718
668	48.65269461
669	48.57997010
670	48.65671642
671	48.58420268
672	48.51190476
673	48.43982169
674	48.36795252
675	48.29629630
676	48.37278107
677	48.44903988
678	48.52507375
679	48.45360825
680	48.38235294
681	48.31130690
682	48.38709677
683	48.31625183
684	48.24561404
685	48.32116788
686	48.25072886
687	48.32605531
688	48.40116279
689	48.33091437
690	48.40579710
691	48.33574530
692	48.41040462
693	48.48484848
694	48.55907781
695	48.63309353
696	48.56321839
697	48.49354376
698	48.42406877
699	48.35479256
700	48.28571429
701	48.35948645
702	48.29059829
703	48.22190612
704	48.29545455
705	48.36879433
706	48.44192635
707	48.51485149
708	48.58757062
709	48.66008463
710	48.73239437
711	48.80450070
712	48.87640449
713	48.94810659
714	48.87955182
715	48.81118881
716	48.74301676
717	48.67503487
718	48.60724234
719	48.53963839
720	48.61111111
721	48.54368932
722	48.47645429
723	48.54771784
724	48.61878453
725	48.68965517
726	48.62258953
727	48.55570839
728	48.62637363
729	48.69684499
730	48.76712329
731	48.83720930
732	48.90710383
733	48.97680764
734	49.04632153
735	48.97959184
736	49.04891304
737	48.98236092
738	48.91598916
739	48.98511502
740	48.91891892
741	48.98785425
742	49.05660377
743	49.12516824
744	49.05913978
745	49.12751678
746	49.06166220
747	48.99598394
748	48.93048128
749	48.86515354
750	48.93333333

Final result: 48.9333 ±1.8265
Random chance: 25.0083 ±1.5824