OSError: Can't load tokenizer for '5roop/whisper-large-v3-ParlaSpeech-HR'

#1
by ir2718 - opened

Hi,

I've tried using the model directly, but I'm encountering an error:

>>> from transformers import AutoProcessor
>>> processor = AutoProcessor.from_pretrained("5roop/whisper-large-v3-ParlaSpeech-HR")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxx/whisper_tryout/whisper_inference/lib/python3.11/site-packages/transformers/models/auto/processing_auto.py", line 312, in from_pretrained
    return processor_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/whisper_tryout/whisper_inference/lib/python3.11/site-packages/transformers/processing_utils.py", line 465, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/whisper_tryout/whisper_inference/lib/python3.11/site-packages/transformers/processing_utils.py", line 511, in _get_arguments_from_pretrained
    args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/whisper_tryout/whisper_inference/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2032, in from_pretrained
    raise EnvironmentError(
OSError: Can't load tokenizer for '5roop/whisper-large-v3-ParlaSpeech-HR'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '5roop/whisper-large-v3-ParlaSpeech-HR' is the correct path to a directory containing all relevant files for a WhisperTokenizer tokenizer.

The same error occurs when using the pipeline function:

>>> from transformers import pipeline
>>> 
>>> pipe = pipeline("automatic-speech-recognition", model="5roop/whisper-large-v3-ParlaSpeech-HR")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxx/whisper_tryout/whisper_inference/lib/python3.11/site-packages/transformers/pipelines/__init__.py", line 1004, in pipeline
    tokenizer = AutoTokenizer.from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/whisper_tryout/whisper_inference/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 843, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/whisper_tryout/whisper_inference/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2032, in from_pretrained
    raise EnvironmentError(
OSError: Can't load tokenizer for '5roop/whisper-large-v3-ParlaSpeech-HR'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '5roop/whisper-large-v3-ParlaSpeech-HR' is the correct path to a directory containing all relevant files for a WhisperTokenizerFast tokenizer.

Just to get this out of the way, I do not have a local directory with the same name. I've tried this out with transformers versions 4.38.2 and 4.40.2.

I found a solution in the meantime. In case someone stumbles on the same problem as me, have a look here:

https://github.com/5roop/mak_na_konac/blob/main/wrapping_up/scripts/transformerscript.py

ir2718 changed discussion status to closed

Hi, this model is a result of a quick experiment, and first tests show it is very unstable. With this in mind we didn't bother finding the proper way to upload the processor to the model hub and might delete the model sometime in the future.

I'm glad you were able to find the solution we currently use, but consider giving a close look at the results you obtain, we only performed out-of-domain eval with it, but it showed very prone to syllable repetition and it seems to be not very useful for BCS ASR. If you already have some metrics you obtained with this model, we'd love to hear about them.

I'm constantly scanning the huggingface hub for ASR models in BCS languages. My experience so far is that whisper large V2 and V3 perform the best (equally good, but with different modes of mistakes). So as soon as I saw a whisper model fine-tuned on ParlaSpeech I just had to give it a go. I didn't bother with calculating any metrics as I saw a lot of repetitions in the transcripts, meaning it's probably not as good as the out-of-the-box whisper. In case you manage to achieve some good results in the future, please ping me. I would be very grateful.

Sign up or log in to comment