====== Perplexity statistics ====== Mean PPL(Q) : 9.087639 ± 0.058224 Mean PPL(base) : 7.534124 ± 0.048206 Cor(ln(PPL(Q)), ln(PPL(base))): 96.96% Mean ln(PPL(Q)/PPL(base)) : 0.187473 ± 0.001579 Mean PPL(Q)/PPL(base) : 1.206197 ± 0.001905 Mean PPL(Q)-PPL(base) : 1.553515 ± 0.016466 ====== KL divergence statistics ====== Mean KLD: 0.173555 ± 0.000494 Maximum KLD: 5.439730 99.9% KLD: 2.047228 99.0% KLD: 0.821818 99.0% KLD: 0.821818 Median KLD: 0.143569 10.0% KLD: 0.011327 5.0% KLD: 0.003726 1.0% KLD: 0.000528 Minimum KLD: 0.000000 ====== Token probability statistics ====== Mean Δp: -4.759 ± 0.031 % Maximum Δp: 81.197% 99.9% Δp: 38.726% 99.0% Δp: 21.998% 95.0% Δp: 9.357% 90.0% Δp: 4.162% 75.0% Δp: 0.081% Median Δp: -1.171% 25.0% Δp: -8.442% 10.0% Δp: -19.664% 5.0% Δp: -27.455% 1.0% Δp: -44.384% 0.1% Δp: -74.316% Minimum Δp: -97.360% RMS Δp : 12.642 ± 0.047 % Same top p: 77.787 ± 0.110 %