====== Perplexity statistics ====== Mean PPL(Q) : 6.620408 ± 0.040695 Mean PPL(base) : 6.554978 ± 0.040159 Cor(ln(PPL(Q)), ln(PPL(base))): 99.76% Mean ln(PPL(Q)/PPL(base)) : 0.009932 ± 0.000423 Mean PPL(Q)/PPL(base) : 1.009982 ± 0.000428 Mean PPL(Q)-PPL(base) : 0.065430 ± 0.002837 ====== KL divergence statistics ====== Mean KLD: 0.011434 ± 0.000081 Maximum KLD: 2.556788 99.9% KLD: 0.366821 99.0% KLD: 0.101088 99.0% KLD: 0.101088 Median KLD: 0.006105 10.0% KLD: 0.000277 5.0% KLD: 0.000078 1.0% KLD: 0.000009 Minimum KLD: -0.000042 ====== Token probability statistics ====== Mean Δp: -0.205 ± 0.008 % Maximum Δp: 59.370% 99.9% Δp: 18.373% 99.0% Δp: 8.211% 95.0% Δp: 3.773% 90.0% Δp: 2.210% 75.0% Δp: 0.468% Median Δp: -0.004% 25.0% Δp: -0.743% 10.0% Δp: -2.773% 5.0% Δp: -4.507% 1.0% Δp: -9.903% 0.1% Δp: -27.143% Minimum Δp: -86.382% RMS Δp : 3.175 ± 0.030 % Same top p: 94.641 ± 0.059 %