Model fails to load with AutoModelForCausalLM

by horheynm - opened 6 days ago

6 days ago

I see that the model was updated about 16 hours ago. When loading the model with AutoModelForCausalLM, it is failing. Could you take a look?

>>> model = AutoModelForCausalLM.from_pretrained("Xenova/llama2.c-stories15M", device_map="auto")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gohashi/tmp2/llm-compressor/.venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/home/gohashi/tmp2/llm-compressor/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 262, in _wrapper
    return func(*args, **kwargs)
  File "/home/gohashi/tmp2/llm-compressor/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4397, in from_pretrained
    dispatch_model(model, **device_map_kwargs)
  File "/home/gohashi/tmp2/llm-compressor/.venv/lib/python3.10/site-packages/accelerate/big_modeling.py", line 496, in dispatch_model
    model.to(device)
  File "/home/gohashi/tmp2/llm-compressor/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3162, in to
    return super().to(*args, **kwargs)
  File "/home/gohashi/tmp2/llm-compressor/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1343, in to
    return self._apply(convert)
  File "/home/gohashi/tmp2/llm-compressor/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/home/gohashi/tmp2/llm-compressor/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/home/gohashi/tmp2/llm-compressor/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 930, in _apply
    param_applied = fn(param)
  File "/home/gohashi/tmp2/llm-compressor/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1336, in convert
    raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

kylesayrs

6 days ago

•

edited 6 days ago

Confirmed that the parameter that is on the meta device is the lm_head weight, and that this does not occur when setting use_safetensors=False

Xenova

Owner 6 days ago

>>> model = AutoModelForCausalLM.from_pretrained("Xenova/llama2.c-stories15M")

seems to work for me (without device_map). Strange.

kylesayrs

6 days ago

•

edited 6 days ago

Xenova

Owner 6 days ago

Indeed that seems to be it!

kylesayrs

6 days ago

•

edited 6 days ago

From what I can tell, the issue is that this model does not include the embed_tokens.weight tensor in the state dict.
This means that the model is loaded with an lm_head.weight on the execution device but the embed_tokens.weight on the meta device
Later, when model.tie_weights() is called, the output embeddings (lm_head) gets assigned to the input embeddings (embed_tokens), resulting in both lm_head and embed_tokens being on the meta device

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment