====== Perplexity statistics ====== Mean PPL(Q) : 7.911577 ± 0.050533 Mean PPL(base) : 6.554978 ± 0.040159 Cor(ln(PPL(Q)), ln(PPL(base))): 96.13% Mean ln(PPL(Q)/PPL(base)) : 0.188102 ± 0.001761 Mean PPL(Q)/PPL(base) : 1.206957 ± 0.002125 Mean PPL(Q)-PPL(base) : 1.356599 ± 0.016275 ====== KL divergence statistics ====== Mean KLD: 0.194188 ± 0.000679 Maximum KLD: 10.630087 99.9% KLD: 2.995659 99.0% KLD: 1.123950 99.0% KLD: 1.123950 Median KLD: 0.146729 10.0% KLD: 0.006264 5.0% KLD: 0.001677 1.0% KLD: 0.000230 Minimum KLD: 0.000001 ====== Token probability statistics ====== Mean Δp: -3.300 ± 0.034 % Maximum Δp: 82.643% 99.9% Δp: 46.448% 99.0% Δp: 28.068% 95.0% Δp: 14.185% 90.0% Δp: 7.667% 75.0% Δp: 0.669% Median Δp: -0.491% 25.0% Δp: -6.343% 10.0% Δp: -17.979% 5.0% Δp: -26.867% 1.0% Δp: -48.683% 0.1% Δp: -83.954% Minimum Δp: -98.929% RMS Δp : 13.302 ± 0.054 % Same top p: 76.728 ± 0.111 %