hugging-quants
/

Meta-Llama-3.1-405B-Instruct-GPTQ-INT4

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Resources

View closed (10)

Change max_position_embeddings to original value

#18 opened 2 months ago by

AshtonIsNotHere

Can you provide one model using `group_size=1024` to make the model smaller?

#15 opened 5 months ago by

optimum version cannot support llama3.1 405b

#14 opened 5 months ago by

Atomheart-Father

OOM Error

#13 opened 5 months ago by

Source codes to quantize the LLaMA 3.1 405B model

#10 opened 5 months ago by

quantization gptq_marlin (not found gptq_marlin) not work. , remove it. work.

#7 opened 5 months ago by

Accuracy tradeoff

#6 opened 5 months ago by

Value Error when trying to run

#4 opened 5 months ago by