====== Perplexity statistics ====== Mean PPL(Q) : 33.173614 ±0.334504 Mean PPL(base) : 24.931431 ±0.241228 Cor(ln(PPL(Q)), ln(PPL(base))): 96.33% Mean ln(PPL(Q)/PPL(base)) : 0.285626 ±0.002707 Mean PPL(Q)/PPL(base) : 1.330594 ±0.003602 Mean PPL(Q)-PPL(base) : 8.242184 ±0.120934 ====== KL divergence statistics ====== Mean KLD: 0.347117 ±0.000948 Maximum KLD: 6.902395 99.9% KLD: 2.647978 99.0% KLD: 1.587362 99.0% KLD: 1.587362 Median KLD: 0.247814 10.0% KLD: 0.005469 5.0% KLD: 0.000960 1.0% KLD: 0.000060 Minimum KLD: -0.000021 ====== Token probability statistics ====== Mean Δp: -2.845 ± 0.034 % Maximum Δp: 96.093% 99.9% Δp: 55.811% 99.0% Δp: 32.844% 95.0% Δp: 14.133% 90.0% Δp: 6.325% 75.0% Δp: 0.157% Median Δp: -0.095% 25.0% Δp: -4.286% 10.0% Δp: -17.155% 5.0% Δp: -27.930% 1.0% Δp: -50.400% 0.1% Δp: -73.758% Minimum Δp: -98.444% RMS Δp : 13.391 ± 0.052 % Same top p: 71.934 ± 0.116 %