8bit and sharded weights
#37
by
ThreeBlessings
- opened
Hi!
I'm updating a lab for Data-Centric AI course and it would be cool to use this model with load_in_8bit=True
parameter and have it sharded in 2Gb weights for easy use with free tier Colab GPUs.
Is it planned to add this features?
I've used this code:
model_name = "mosaicml/mpt-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,
low_cpu_mem_usage=True,
trust_remote_code=True,
load_in_8bit=True,
torch_dtype=torch.float16,
device_map="auto")
But gives this error:
ValueError: MPTForCausalLM does not support `device_map='auto'` yet.
I believe this should be fixed now as of this PR: https://huggingface.co/mosaicml/mpt-7b-instruct/discussions/41
abhi-mosaic
changed discussion status to
closed