protoc error while deploying the model into SM instance
Hello,
I wanted to try this model into one of our SageMaker instances.
I tried the code to deploy the model you present in the deploy button but the instance proposed ("ml.g5.2xlarge") is too small so it get a memory error while converting the pytorch weights to safetensors, the problem is solved when using the following instance: "ml.g5.8xlarge".
I am using the huggingface llm image version 0.8.2 (latest available).
After this I encountered another problem which I do not know how to solve it.
The problem is the following:
Shard 0 failed to start:
Traceback (most recent call last):
Error: ShardCannotStart
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve
server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 289, in get_model
return CausalLM(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py", line 469, in __init__
tokenizer = AutoTokenizer.from_pretrained(
File "/usr/src/transformers/src/transformers/models/auto/tokenization_auto.py", line 692, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/usr/src/transformers/src/transformers/tokenization_utils_base.py", line 1812, in from_pretrained
return cls._from_pretrained(
File "/usr/src/transformers/src/transformers/tokenization_utils_base.py", line 1975, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/usr/src/transformers/src/transformers/models/llama/tokenization_llama_fast.py", line 89, in __init__
super().__init__(
File "/usr/src/transformers/src/transformers/tokenization_utils_fast.py", line 114, in __init__
fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
File "/usr/src/transformers/src/transformers/convert_slow_tokenizer.py", line 1303, in convert_slow_tokenizer
return converter_class(transformer_tokenizer).converted()
File "/usr/src/transformers/src/transformers/convert_slow_tokenizer.py", line 445, in __init__
from .utils import sentencepiece_model_pb2 as model_pb2
File "/usr/src/transformers/src/transformers/utils/sentencepiece_model_pb2.py", line 91, in <module>
_descriptor.EnumValueDescriptor(
File "/opt/conda/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 796, in __new__
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
I tried to set the PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python in the environmental variable and the error I get is the following:
RuntimeError: Llama is supposed to be a BPE model!
Can you give some hints about what is not right here?
Is it an instance problem?
Is it a package problem?
Thanks in advance
Actually, I am not very familiar with SageMaker, so I am not entirely clear about the specific reasons. However, you can try cloning the latest model files and installing the latest version of Transformers (v4.31.0). If any problems persist, we can further investigate the reasons together.