48k vocab LlamaTokenizer for T5

custom tokenizer from scaling study adapted for T5 training

  • Compression ratio: 3.54
  • Vocabulary size: 48228

Tokens: ['▁In', '▁', '2', '0', '2', '3', ',', '▁Dr', '.', '▁Jane', '▁Smith', '-', 'John', 'son', '▁published', '▁groundbreaking', '▁research', '▁on', '▁quantum', '▁ent', 'ang', 'lement', ',', '▁demonstrating', '▁a', '▁', '9', '9', '.', '9', '%', '▁success', '▁rate', '▁in', '▁tele', 'port', 'ing', '▁qu', 'bits', '▁over', '▁', '1', '0', '0', 'km', '▁using', '▁her', '▁patented', "▁'", 'Q', '-', 'Link', "'", '▁technology', '.', '</s>']

image/png

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Collection including BEE-spoke-data/slimpajama_tok-48128-BPE-forT5