exl2 quants for meow

This repository includes the quantized models for the meow model by Rishiraj Acharya. meow is a fine-tune of SOLAR-10.7B-Instruct-v1.0 with the no_robots dataset.

Current models

exl2 BPW Model Branch Model Size Minimum VRAM (4096 Context, fp16 cache)
2-Bit main 3.28 GB 6GB GPU
4-Bit 4bit 5.61 GB 8GB GPU
5-Bit 5bit 6.92 GB 10GB GPU, 8GB with swap
6-Bit 6bit 8.23 GB 10GB GPU
8-Bit 8bit 10.84 GB 12GB GPU

Note

Using a 12GB Nvidia GeForce RTX 3060 I got on average around 20 tokens per second on the 8-bit quant in full 4096 context.

Where to use

There are a couple places you can use an exl2 model, here are a few:

How to download:

oobabooga's downloader

use something like download-model.py to download with python requests.
Install requirements:

pip install requests tqdm

Example for downloading 5bpw:

python download-model.py Anthonyg5005/rishiraj-meow-10.7B-exl2:5bit

huggingface-cli

You may also use huggingface-cli
To install it, install python hf-hub

pip install huggingface-hub

Example for 5bpw:

huggingface-cli download Anthonyg5005/rishiraj-meow-10.7B-exl2 --local-dir rishiraj-meow-10.7B-exl2-5bpw --revision 5bit

Git LFS (not recommended)

I would recommend the http downloaders over using git, they can resume downloads if failed and are much easier to work with.
Make sure to have git and git LFS installed.
Example for 5bpw download with git:

Have LFS file skip disabled

# windows
set GIT_LFS_SKIP_SMUDGE=0
# linux
export GIT_LFS_SKIP_SMUDGE=0

Clone repo branch

git clone https://huggingface.co/Anthonyg5005/rishiraj-meow-10.7B-exl2 -b 5bit
Downloads last month
20
Inference Examples
Inference API (serverless) does not yet support ExLlamaV2 models for this pipeline type.

Model tree for Anthonyg5005/rishiraj-meow-10.7B-exl2

Adapter
rishiraj/meow
Finetuned
(1)
this model

Datasets used to train Anthonyg5005/rishiraj-meow-10.7B-exl2