====== Perplexity statistics ====== Mean PPL(Q) : 6.754714 ± 0.041611 Mean PPL(base) : 6.554978 ± 0.040159 Cor(ln(PPL(Q)), ln(PPL(base))): 99.29% Mean ln(PPL(Q)/PPL(base)) : 0.030016 ± 0.000731 Mean PPL(Q)/PPL(base) : 1.030471 ± 0.000753 Mean PPL(Q)-PPL(base) : 0.199735 ± 0.005068 ====== KL divergence statistics ====== Mean KLD: 0.033124 ± 0.000235 Maximum KLD: 6.718605 99.9% KLD: 1.106657 99.0% KLD: 0.315475 99.0% KLD: 0.315475 Median KLD: 0.016259 10.0% KLD: 0.000792 5.0% KLD: 0.000230 1.0% KLD: 0.000033 Minimum KLD: -0.000025 ====== Token probability statistics ====== Mean Δp: -0.664 ± 0.014 % Maximum Δp: 95.789% 99.9% Δp: 26.631% 99.0% Δp: 11.752% 95.0% Δp: 5.423% 90.0% Δp: 3.151% 75.0% Δp: 0.595% Median Δp: -0.035% 25.0% Δp: -1.345% 10.0% Δp: -4.724% 5.0% Δp: -7.884% 1.0% Δp: -20.310% 0.1% Δp: -52.412% Minimum Δp: -97.482% RMS Δp : 5.475 ± 0.047 % Same top p: 91.545 ± 0.073 %