Tips on loading model with low memory
#3
by
ryanramos
- opened
Was just wondering if anyone's been able to load this model in something akin to a free Colab runtime i.e. ~12GB RAM, Tesla T4? I've tried the code snippet for loading the model in 8 bit precision (so I've got the device_map set to "auto") and have no luck. Luckily for me the model is already sharded (I can't normally load an 11B T5 without sharding) but I'm guessing I still can't handle the current shard size.
If it's just inference, something like https://huggingface.co/bigscience/bloomz/discussions/28 may work!
Thanks! I actually completely forgot about Petals. Might even use this for a different research project; thanks again!
👍 cc @borzunov
christopher
changed discussion status to
closed