DAMO-NLP-MT/polylm-chat-13b · protoc error while deploying the model into SM instance

Hello,

I wanted to try this model into one of our SageMaker instances.
I tried the code to deploy the model you present in the deploy button but the instance proposed ("ml.g5.2xlarge") is too small so it get a memory error while converting the pytorch weights to safetensors, the problem is solved when using the following instance: "ml.g5.8xlarge".
I am using the huggingface llm image version 0.8.2 (latest available).
After this I encountered another problem which I do not know how to solve it.
The problem is the following:

 Shard 0 failed to start:
Traceback (most recent call last):
Error: ShardCannotStart
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve
    server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
    asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
    model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 289, in get_model
    return CausalLM(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py", line 469, in __init__
    tokenizer = AutoTokenizer.from_pretrained(
  File "/usr/src/transformers/src/transformers/models/auto/tokenization_auto.py", line 692, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/usr/src/transformers/src/transformers/tokenization_utils_base.py", line 1812, in from_pretrained
    return cls._from_pretrained(
  File "/usr/src/transformers/src/transformers/tokenization_utils_base.py", line 1975, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/usr/src/transformers/src/transformers/models/llama/tokenization_llama_fast.py", line 89, in __init__
    super().__init__(
  File "/usr/src/transformers/src/transformers/tokenization_utils_fast.py", line 114, in __init__
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
  File "/usr/src/transformers/src/transformers/convert_slow_tokenizer.py", line 1303, in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
  File "/usr/src/transformers/src/transformers/convert_slow_tokenizer.py", line 445, in __init__
    from .utils import sentencepiece_model_pb2 as model_pb2
  File "/usr/src/transformers/src/transformers/utils/sentencepiece_model_pb2.py", line 91, in <module>
    _descriptor.EnumValueDescriptor(
  File "/opt/conda/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 796, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

I tried to set the PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python in the environmental variable and the error I get is the following:

RuntimeError: Llama is supposed to be a BPE model!

Can you give some hints about what is not right here?
Is it an instance problem?
Is it a package problem?

Thanks in advance