Spaces:
Running
Running
_compute raises `ValueError: Asking to pad ...` if called with batch_size=1 and tokenizer w/o pad_token
#8
by
spyysalo
- opened
The current implementation only tries to add a pad_token
if batch_size > 1
(https://github.com/huggingface/evaluate/blob/main/metrics/perplexity/perplexity.py#L122), but the tokenizer invocation expects a pad_token
regardless of batch_size
:
>>> perplexity.compute(predictions=['Hello.'], model_id='gpt2')
{'perplexities': [394.9745178222656], 'mean_perplexity': 394.9745178222656}
>>> perplexity.compute(predictions=['Hello.'], model_id='gpt2', batch_size=1)
Traceback (most recent call last):
[...]
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.