Does whisper-large-v3 work on Sagemaker?

#58
by dkincaid - opened

I've been trying to deploy on Sagemaker and can't seem to get it to work once deployed.

I keep getting this error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "Wrong index found for \u003c|0.02|\u003e: should be None but found 50366."

I can't find much about this error anywhere, but was wondering if it had to do with a transformers version problem.

Here's the code I'm using:

hub = {
    'HF_MODEL_ID':'openai/whisper-large-v3',
    'HF_TASK':'automatic-speech-recognition'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.26.0',
    pytorch_version='1.13.1',
    py_version='py39',
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
audio_serializer = DataSerializer(content_type='audio/x-audio')
predictor = huggingface_model.deploy(
    initial_instance_count=1, # number of instances
    instance_type='ml.g4dn.xlarge', # ec2 instance type
    serializer=audio_serializer
)

I've been banging my head against this same problem all week. As best I can tell, the "Deploy this model using SageMaker SDK" instructions are incorrect.

In particular, it seems the AWS Deep Learning Containers only support up to transformers version 4.26.0, which is too low.

I've been following these guides and deploying a model.tar.gz that essentially consists of only an code/inference.py and code/requirements.txt file so that I can force using transformers==4.36.2, which does seem to work
https://github.com/aws/sagemaker-huggingface-inference-toolkit#-user-defined-codemodules
https://aws.amazon.com/blogs/machine-learning/hugging-face-on-amazon-sagemaker-bring-your-own-scripts-and-data/

The inference.py pretty much just follows the getting started instructions, looks more or less like

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-large-v3"
task = "automatic-speech-recognition"

def model_fn(model_dir):
    model = AutoModelForSpeechSeq2Seq.from_pretrained(
        model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
    )
    model.to(device)

    processor = AutoProcessor.from_pretrained(model_id)

    return pipeline(
        task,
        model=model,
        tokenizer=processor.tokenizer,
        feature_extractor=processor.feature_extractor,
        return_timestamps=True,
        torch_dtype=torch_dtype,
        device=device,
    )

And actually now that I'm thinking about it, you may be able to just get away with having the requirements.txt file and setting the HF_MODEL_ID and HF_TASK environment variables...

Wow, this is great! Thank you so much for posting this. I didn't know you could do this.

thank you so much for your support
I have written this for a seamless deployment https://dev.to/mohalbakerkaw/deploying-openais-whisper-large-v3-model-on-sagemaker-using-hugging-face-libraries-hlh

can you support on how can i pass generate_kwargs
the goal is to have the task as transcribe
i don't want all my transcripts to be in English

https://huggingface.co/openai/whisper-large-v3/discussions/71
they talk about the issue in this discussion
but not sure how to deal with it on sagemaker

Sign up or log in to comment