Token indices sequence length is longer than the specified maximum sequence length for this model (269923 > 131072)

#23
by wasimsafdar - opened

I am getting the above-mentioned error while using llama3.2 for vision. If I use an image URL, it works fine. However, if I upload the same image from a local directory, I get an error. I am using transformed from hugging face. My device is a Mac M1 pro, with 16 GB RAM.

There is also a runtime error "RuntimeError: MPS backend out of memory (MPS allocated: 17.08 GB, other allocations: 672.00 KB, max allowed: 18.13 GB). Tried to allocate 1.54 GB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure)."

I searched Google and people said we need to set "PYTORCH_MPS_HIGH_WATERMARK_RATIO" to 0.7. However, I do not know how to set it in PyTorch. Most answers are related to Stable diffusion webui.

This is my code.

import warnings
from transformers import pipeline
import torch
access_token = "hf_...."

warnings.filterwarnings('ignore')

model_id = "meta-llama/Llama-3.2-3B-Instruct"
pipe = pipeline(
"text-generation",
model=model_id,
token=access_token,
torch_dtype=torch.bfloat16,
device_map="auto",
)

import base64

def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
base64_image = encode_image("/Users/mac/Downloads/Llama_Repo.jpeg")

messages = [
{"role": "user",
"content": [
{"type": "text",
"text": "describe the image in one sentence"
},
{"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
}
]
},
]

outputs = pipe(
messages,
max_new_tokens=256,
)
response = outputs[0]["generated_text"][-1]["content"]
print(response)

Please guide me on how to solve it in Pytorch. Thanks

Sign up or log in to comment