bodo-gpt2-clm-setencepiece / tokenizer_config.json
Sanjib Narzary
tokenizer sentence piece 52k and gpt2 config added
fd0ac73
raw
history blame contribute delete
233 Bytes
{
"clean_up_tokenization_spaces": true,
"model_max_length": 512,
"special_tokens": [
"<s>",
"<pad>",
"</s>",
"<unk>",
"<cls>",
"<sep>",
"<mask>"
],
"tokenizer_class": "PreTrainedTokenizerFast"
}