====== Perplexity statistics ====== Mean PPL(Q) : 31.864547 ±0.323662 Mean PPL(base) : 24.931431 ±0.241228 Cor(ln(PPL(Q)), ln(PPL(base))): 96.68% Mean ln(PPL(Q)/PPL(base)) : 0.245365 ±0.002599 Mean PPL(Q)/PPL(base) : 1.278087 ±0.003322 Mean PPL(Q)-PPL(base) : 6.933116 ±0.109449 ====== KL divergence statistics ====== Mean KLD: 0.312659 ±0.000862 Maximum KLD: 7.268548 99.9% KLD: 2.411516 99.0% KLD: 1.440652 99.0% KLD: 1.440652 Median KLD: 0.222315 10.0% KLD: 0.004036 5.0% KLD: 0.000678 1.0% KLD: 0.000038 Minimum KLD: -0.000022 ====== Token probability statistics ====== Mean Δp: -1.655 ± 0.032 % Maximum Δp: 92.364% 99.9% Δp: 58.087% 99.0% Δp: 35.088% 95.0% Δp: 16.357% 90.0% Δp: 8.015% 75.0% Δp: 0.435% Median Δp: -0.030% 25.0% Δp: -2.925% 10.0% Δp: -14.089% 5.0% Δp: -24.141% 1.0% Δp: -46.160% 0.1% Δp: -70.563% Minimum Δp: -98.312% RMS Δp : 12.556 ± 0.050 % Same top p: 73.377 ± 0.115 %