Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference

8bit and sharded weights

#37
by ThreeBlessings - opened

Hi!

I'm updating a lab for Data-Centric AI course and it would be cool to use this model with load_in_8bit=True parameter and have it sharded in 2Gb weights for easy use with free tier Colab GPUs.

Is it planned to add this features?

I've used this code:

model_name = "mosaicml/mpt-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,
                                             low_cpu_mem_usage=True,
                                             trust_remote_code=True,
                                             load_in_8bit=True,
                                             torch_dtype=torch.float16,
                                             device_map="auto")

But gives this error:

ValueError: MPTForCausalLM does not support `device_map='auto'` yet.

I believe this should be fixed now as of this PR: https://huggingface.co/mosaicml/mpt-7b-instruct/discussions/41

abhi-mosaic changed discussion status to closed

Sign up or log in to comment