====== Perplexity statistics ====== Mean PPL(Q) : 32.110566 ±0.327098 Mean PPL(base) : 24.931431 ±0.241228 Cor(ln(PPL(Q)), ln(PPL(base))): 96.81% Mean ln(PPL(Q)/PPL(base)) : 0.253056 ±0.002558 Mean PPL(Q)/PPL(base) : 1.287955 ±0.003295 Mean PPL(Q)-PPL(base) : 7.179135 ±0.111368 ====== KL divergence statistics ====== Mean KLD: 0.302751 ±0.000829 Maximum KLD: 5.970398 99.9% KLD: 2.277307 99.0% KLD: 1.386654 99.0% KLD: 1.386654 Median KLD: 0.216111 10.0% KLD: 0.003771 5.0% KLD: 0.000612 1.0% KLD: 0.000034 Minimum KLD: -0.000013 ====== Token probability statistics ====== Mean Δp: -1.582 ± 0.032 % Maximum Δp: 88.233% 99.9% Δp: 56.689% 99.0% Δp: 34.469% 95.0% Δp: 16.136% 90.0% Δp: 8.038% 75.0% Δp: 0.436% Median Δp: -0.027% 25.0% Δp: -2.816% 10.0% Δp: -13.759% 5.0% Δp: -23.729% 1.0% Δp: -45.043% 0.1% Δp: -69.818% Minimum Δp: -99.411% RMS Δp : 12.323 ± 0.049 % Same top p: 73.717 ± 0.114 %