Tips on loading model with low memory

by ryanramos - opened Jan 27, 2023

Jan 27, 2023

Was just wondering if anyone's been able to load this model in something akin to a free Colab runtime i.e. ~12GB RAM, Tesla T4? I've tried the code snippet for loading the model in 8 bit precision (so I've got the device_map set to "auto") and have no luck. Luckily for me the model is already sharded (I can't normally load an 11B T5 without sharding) but I'm guessing I still can't handle the current shard size.

Muennighoff

BigScience Workshop org Feb 7, 2023

If it's just inference, something like https://huggingface.co/bigscience/bloomz/discussions/28 may work!

ryanramos

Feb 8, 2023

Thanks! I actually completely forgot about Petals. Might even use this for a different research project; thanks again!

Muennighoff

BigScience Workshop org Feb 8, 2023

👍 cc @borzunov

christopher changed discussion status to closed Jul 3

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment