runtime error
e "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 371, in hf_raise_for_status raise HfHubHTTPError(str(e), response=response) from e huggingface_hub.utils._errors.HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://api-inference.huggingface.co/models/skuma307/Llama-3-8B-AWQ-4bit (Request ID: azS57YzQUNRP8_Y2h6pUx) Could not load model skuma307/Llama-3-8B-AWQ-4bit with any of the following classes: (<class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>,). See the original errors: while loading with LlamaForCausalLM, an error is thrown: Traceback (most recent call last): File "/src/transformers/src/transformers/pipelines/base.py", line 279, in infer_framework_load_model model = model_class.from_pretrained(model, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/modeling_utils.py", line 3016, in from_pretrained config.quantization_config = AutoHfQuantizer.merge_quantization_configs( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/quantizers/auto.py", line 145, in merge_quantization_configs quantization_config = AutoQuantizationConfig.from_dict(quantization_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/quantizers/auto.py", line 75, in from_dict return target_cls.from_dict(quantization_config_dict) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/utils/quantization_config.py", line 90, in from_dict config = cls(**config_dict) ^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/utils/quantization_config.py", line 655, in __init__ self.post_init() File "/src/transformers/src/transformers/utils/quantization_config.py", line 662, in post_init raise ValueError("AWQ is only available on GPU") ValueError: AWQ is only available on GPU
Container logs:
Fetching error logs...