====== Perplexity statistics ====== Mean PPL(Q) : 10.242604 ± 0.066586 Mean PPL(base) : 7.534124 ± 0.048206 Cor(ln(PPL(Q)), ln(PPL(base))): 94.81% Mean ln(PPL(Q)/PPL(base)) : 0.307113 ± 0.002080 Mean PPL(Q)/PPL(base) : 1.359495 ± 0.002827 Mean PPL(Q)-PPL(base) : 2.708480 ± 0.025898 ====== KL divergence statistics ====== Mean KLD: 0.276176 ± 0.000860 Maximum KLD: 8.462747 99.9% KLD: 3.304338 99.0% KLD: 1.622670 99.0% KLD: 1.622670 Median KLD: 0.211115 10.0% KLD: 0.017779 5.0% KLD: 0.004912 1.0% KLD: 0.000565 Minimum KLD: 0.000002 ====== Token probability statistics ====== Mean Δp: -7.071 ± 0.041 % Maximum Δp: 93.414% 99.9% Δp: 45.356% 99.0% Δp: 24.292% 95.0% Δp: 9.695% 90.0% Δp: 4.049% 75.0% Δp: 0.041% Median Δp: -1.661% 25.0% Δp: -11.538% 10.0% Δp: -26.617% 5.0% Δp: -38.289% 1.0% Δp: -66.295% 0.1% Δp: -86.075% Minimum Δp: -99.772% RMS Δp : 17.133 ± 0.061 % Same top p: 73.659 ± 0.116 %