Error
#1
by
Hardcore7651
- opened
Tried on an A100 and 3 A5000s in text gen webui using runpod, both times I got this:
ImportError: cannot import name 'ExLlamaV2Cache_Q4' from 'exllamav2' (/usr/local/lib/python3.10/dist-packages/exllamav2/init.py)
Hi. This appears to be an issue with the exllamav2 version you're running: https://github.com/TheBlokeAI/dockerLLM/issues/17
Though to be fair, it's not like I was able to test above 3.5 quants on my home dual 3090 setup. I'm running through these now on a Runpod A100 80GB system to make sure they're all good. But try to update your exllamav2 with pip as per the instructions in the github issue.
I was able to run 3.0 through 5.0 quants on a Runpod A100 80GB instance using the below commands, so they should all be working fine.
cd /workspace
python -m pip install --upgrade pip
pip uninstall torch torchaudio torchvision -y
# ExllamaV2
git clone https://github.com/turboderp/exllamav2
cd exllamav2
pip install -r requirements.txt
pip install hf_transfer huggingface_hub[hf_transfer]
# Test Inference on Llama 7B
huggingface-cli download --local-dir-use-symlinks=False --revision 5.0bpw --local-dir turboderp_Llama2-7B-exl2_5.0bpw turboderp/Llama2-7B-exl2
python test_inference.py -m turboderp_Llama2-7B-exl2_5.0bpw -p "Once upon a time,"
rm -r turboderp_Llama2-7B-exl2_5.0bpw
# Download and inference on Midnight quants
## 3.0
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_3.0bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_3.0bpw
python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_3.0bpw -p "Once upon a time,"
rm -r Dracones_Midnight-Miqu-103B-v1.0_exl2_3.0bpw
## 3.5
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_3.5bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_3.5bpw
python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_3.5bpw -p "Once upon a time,"
rm -r Dracones_Midnight-Miqu-103B-v1.0_exl2_3.5bpw
## 3.75
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_3.75bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_3.75bpw
python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_3.75bpw -p "Once upon a time,"
rm -r Dracones_Midnight-Miqu-103B-v1.0_exl2_3.75bpw
## 4.0
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_4.0bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_4.0bpw
python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_4.0bpw -p "Once upon a time,"
rm -r Dracones_Midnight-Miqu-103B-v1.0_exl2_4.0bpw
## 4.25
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_4.25bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw
python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_4.25bpw -p "Once upon a time,"
rm -r Dracones_Midnight-Miqu-103B-v1.0_exl2_4.25bpw
## 4.5
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_4.5bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_4.5bpw
python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_4.5bpw -p "Once upon a time,"
rm -r Dracones_Midnight-Miqu-103B-v1.0_exl2_4.5bpw
## 5.0
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_5.0bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_5.0bpw
python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_5.0bpw -p "Once upon a time,"
Dracones
changed discussion status to
closed