====== Perplexity statistics ====== Mean PPL(Q) : 8.262126 ± 0.052116 Mean PPL(base) : 6.554978 ± 0.040159 Cor(ln(PPL(Q)), ln(PPL(base))): 95.48% Mean ln(PPL(Q)/PPL(base)) : 0.231457 ± 0.001879 Mean PPL(Q)/PPL(base) : 1.260435 ± 0.002368 Mean PPL(Q)-PPL(base) : 1.707148 ± 0.018230 ====== KL divergence statistics ====== Mean KLD: 0.228988 ± 0.000778 Maximum KLD: 13.717445 99.9% KLD: 3.438575 99.0% KLD: 1.323173 99.0% KLD: 1.323173 Median KLD: 0.173725 10.0% KLD: 0.010253 5.0% KLD: 0.002920 1.0% KLD: 0.000390 Minimum KLD: 0.000003 ====== Token probability statistics ====== Mean Δp: -5.158 ± 0.037 % Maximum Δp: 81.093% 99.9% Δp: 45.753% 99.0% Δp: 25.847% 95.0% Δp: 11.969% 90.0% Δp: 5.788% 75.0% Δp: 0.172% Median Δp: -1.141% 25.0% Δp: -8.948% 10.0% Δp: -22.123% 5.0% Δp: -31.922% 1.0% Δp: -54.774% 0.1% Δp: -87.519% Minimum Δp: -99.685% RMS Δp : 14.837 ± 0.057 % Same top p: 75.390 ± 0.114 %