====== Perplexity statistics ====== Mean PPL(Q) : 9.016990 ± 0.059441 Mean PPL(base) : 7.534124 ± 0.048206 Cor(ln(PPL(Q)), ln(PPL(base))): 97.13% Mean ln(PPL(Q)/PPL(base)) : 0.179668 ± 0.001569 Mean PPL(Q)/PPL(base) : 1.196820 ± 0.001878 Mean PPL(Q)-PPL(base) : 1.482867 ± 0.017057 ====== KL divergence statistics ====== Mean KLD: 0.157666 ± 0.000482 Maximum KLD: 9.472803 99.9% KLD: 2.049311 99.0% KLD: 0.765471 99.0% KLD: 0.765471 Median KLD: 0.127495 10.0% KLD: 0.007270 5.0% KLD: 0.002221 1.0% KLD: 0.000296 Minimum KLD: -0.000001 ====== Token probability statistics ====== Mean Δp: -3.373 ± 0.030 % Maximum Δp: 68.206% 99.9% Δp: 40.367% 99.0% Δp: 24.601% 95.0% Δp: 11.563% 90.0% Δp: 5.922% 75.0% Δp: 0.342% Median Δp: -0.591% 25.0% Δp: -6.307% 10.0% Δp: -16.931% 5.0% Δp: -24.574% 1.0% Δp: -41.728% 0.1% Δp: -69.844% Minimum Δp: -98.781% RMS Δp : 11.741 ± 0.045 % Same top p: 78.308 ± 0.109 %