med_tokenizer_Unigram / token_fraction_describe.txt
parasora's picture
Upload 5 files
337946c verified
raw
history blame contribute delete
768 Bytes
token text token/text
count 19980.000000 19980.000000 19980.000000
mean 87.750000 181.553804 0.494357
std 237.185668 374.992707 0.100927
min 7.000000 41.000000 0.134615
10% 24.000000 46.000000 0.387381
25% 34.000000 64.000000 0.432000
33% 42.000000 85.000000 0.449829
50% 66.000000 139.000000 0.483740
67% 97.000000 205.000000 0.521950
75% 117.000000 245.000000 0.545455
80% 132.000000 277.000000 0.562044
90% 176.100000 371.000000 0.619048
95% 221.000000 461.000000 0.674419
99% 325.210000 666.000000 0.804878
max 32162.000000 49156.000000 1.024390