Instructions for running on runpod.io
#2
by
simsim314
- opened
I had successfully managed to run 2.75bpw brunch on 64GB VRAM with 4 RTX A4000 (16GB per GPU).
Here are some key points:
- The template I'm using
runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04
- Download and install exllamav2 (inside jupyter).
!git clone https://github.com/turboderp/exllamav2
%cd exllamav2
# Optionally, create and activate a new conda environment
!pip install -r requirements.txt
!pip install .
!pip install huggingface_hub
- Download the model:
!huggingface-cli download turboderp/dbrx-instruct-exl2 --revision "2.75bpw" --local-dir dbrx_275 --exclude "*.safetensors"
%cd dbrx_275
!wget "https://huggingface.co/turboderp/dbrx-instruct-exl2/resolve/2.75bpw/output-00001-of-00006.safetensors"
!wget "https://huggingface.co/turboderp/dbrx-instruct-exl2/resolve/2.75bpw/output-00002-of-00006.safetensors"
!wget "https://huggingface.co/turboderp/dbrx-instruct-exl2/resolve/2.75bpw/output-00003-of-00006.safetensors"
!wget "https://huggingface.co/turboderp/dbrx-instruct-exl2/resolve/2.75bpw/output-00004-of-00006.safetensors"
!wget "https://huggingface.co/turboderp/dbrx-instruct-exl2/resolve/2.75bpw/output-00005-of-00006.safetensors"
!wget "https://huggingface.co/turboderp/dbrx-instruct-exl2/resolve/2.75bpw/output-00006-of-00006.safetensors"
%cd ..
Note: The reason I am not using huggingface-cli download
for safetensors, is because runpod is downloading it first into limited space container (20GB).
- Run exllamav2 in terminal (working directory exllamav2):
python examples/chat.py -mode chatml -m dbrx_275 --gpu_split auto
simsim314
changed discussion title from
Running on runpod.io instructions
to Instructions for running on runpod.io