openai-community/gpt2-medium · Adds the tokenizer configuration file

OpenAI community org Feb 16, 2024

The tokenizer configuration file is missing/incorrect and therefore leading to unforeseen errors after the migration of the canonical models.

Refer to the following issue for more information: transformers#29050

The current failing code is the following:

from transformers import AutoTokenizer

>>> previous_tokenizer = AutoTokenizer.from_pretrained("gpt2-medium")
>>> current_tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2-medium")
>>> print(previous_tokenizer.model_max_length, current_tokenizer.model_max_length)
1000000000000000019884624838656, 1024

This is the result after the fix:

from transformers import AutoTokenizer

>>> previous_tokenizer = AutoTokenizer.from_pretrained("gpt2-medium")
>>> current_tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2-medium")
>>> print(previous_tokenizer.model_max_length, current_tokenizer.model_max_length)
1024, 1024

Adds tokenizer_config.json file46d27d4b

lysandre changed pull request status to open Feb 19, 2024

lysandre changed pull request status to merged Feb 19, 2024